Towards Progressive Search-driven Entity Resolution
Jun 1, 2018·,,,
·
1 min read
Alberto Pietrangelo
Giovanni Simonini
Sonia Bergamaschi
Felix Naumann
Ioannis Koumarelas

Abstract
Keyword-search systems for databases aim to answer a user query composed of a few terms with a ranked list of records. They are powerful and easy-to-use data exploration tools for a wide range of contexts. For instance, given a product database gathered by scraping e-commerce websites, these systems enable even non-technical users to explore the item set. However, if the database contains dirty records (i.e., incomplete and duplicated records), a preprocessing step to clean the data is required. One fundamental data cleaning step is Entity Resolution, i.e., the task of identifying and fusing together all the records that refer to the same real-world entity. This task is typically executed on the whole data, independently of (i) the portion of the entities that a user may indicate through keywords, and (ii) the order priority that a user might express through an ORDER BY clause.
This paper describes a first step to solve the problem of progressive search-driven Entity Resolution: resolving all the entities described by a user through a handful of keywords, progressively according to an ORDER BY clause. We discuss the features of our method, named SearchER, and showcase examples of keyword queries on two real-world datasets obtained with a demonstrative prototype.
Type
Publication
In Proceedings of the Italian Symposium on Advanced Database Systems (SEBD) 2018
Note
Click the Cite button above to enable visitors to import publication metadata into their reference management software.
Note
Create your slides in Markdown - click the Slides button to check out the example.
Add supplementary notes, full text, or examples here. You can include code, math, and images.