
Integration of data has been the focus of research for many years now. At the data level, entity resolution (also known as record deduplication) aims at "cleaning'' a database by identifying tuples that represent the same entity. The need for data integration stems from the heterogeneity of data (arriving from multiple sources), the lack of sufficient semantics to fully understand the meaning of data, and errors that may stem from incorrect data insertion and modifications (e.g., typos and eliminations). With a body of research that spans over multiple decades, entity resolution has a wealth of formal models of integration, algorithmic solutions for efficient and effective integration, and a body of systems, benchmarks and competitions that allow comparative empirical analysis of solutions.
The research group at the Technion focuses on finding efficient ways to perform blocking, clustering the dataset to ensure efficient entity resolution. We also focus on uncertainty analysis of the entity resolution process, putting emphasis on data veracity.
