Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection
Oct 1, 2018·
,,·
1 min read
Ioannis Koumarelas
Axel Kroschk
Felix Naumann

Abstract
Given a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario a database record is completely identical to the query. However in most cases problems do arise, for instance as a result of data errors, data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values and similarity comparisons, each requiring some domain knowledge. In this paper, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process we propose a domain-specific procedure to first enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding, and second to select the best similarity measure per each address attribute, that will finally help the classifier to achieve the best f-measure. We report on our experience in selecting geocoding services and discovering similarity measures for a concrete but common industry use-case.
Type
Publication
In ACM Journal of Data and Information Quality 2018
Note
Click the Cite button above to enable visitors to import publication metadata into their reference management software.
Note
Create your slides in Markdown - click the Slides button to check out the example.
Add supplementary notes, full text, or examples here. You can include code, math, and images.