Entity Resolution (ER) is a fundamental problem with many applications.Machine learning (ML)-based and rule-based approaches have been widely studiedfor decades, with many efforts being geared towards which features/attributesto select, which similarity functions to employ, and which blocking function touse - complicating the deployment of an ER system as a turn-key system. In thispaper, we present DeepER, a turn-key ER system powered by deep learning (DL)techniques. The central idea is that distributed representations andrepresentation learning from DL can alleviate the above human efforts fortuning existing ER systems. DeepER makes several notable contributions:encoding a tuple as a distributed representation of attribute values, buildingclassifiers using these representations and a semantic aware blocking based onLSH, and learning and tuning the distributed representations for ER. Weevaluate our algorithms on multiple benchmark datasets and achieve competitiveresults while requiring minimal interaction with experts.
展开▼