Data Quality

Knowledge Transfer for Entity Resolution with Siamese Neural Networks featured image

Knowledge Transfer for Entity Resolution with Siamese Neural Networks

We propose a deep Siamese neural network that learns a similarity measure tailored to a dataset, eliminating manual feature engineering. We also show that knowledge transfer …

Michael Loster
MDedup: Duplicate Detection with Matching Dependencies featured image

MDedup: Duplicate Detection with Matching Dependencies

Our system uses automatically discovered MDs, dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules.

avatar
Ioannis Koumarelas
Data Preparation for Duplicate Detection featured image

Data Preparation for Duplicate Detection

We propose the first workflow that systematically integrates data preparation operations before duplicate detection, improving AUC-PR by up to 19%.

avatar
Ioannis Koumarelas
Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection featured image

Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection

In this paper, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process …

avatar
Ioannis Koumarelas