Ioannis Koumarelas, PhD
Ioannis Koumarelas, PhD
Home
Skills
Experience
Accomplishments
Publications
Contact
Light
Dark
Automatic
Machine Learning
MDedup: Duplicate Detection with Matching Dependencies
Our system uses automatically discovered MDs, various dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules. Once trained, the model can select useful MDs for duplicate detection on any new dataset.
Ioannis Koumarelas
,
Thorsten Papenbrock
,
Felix Naumann
PDF
Cite
Code
Video
Supplementary code utilities
Repeatability
Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection
In this paper, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process we propose a domain-specific procedure to first enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding, and second to select the best similarity measure per each address attribute, that will finally help the classifier to achieve the best f-measure.
Ioannis Koumarelas
,
Axel Kroschk
,
Felix Naumann
PDF
Cite
Cite
×