Entity matching is the problem of determining if two entities in a data set refer to the same real-world object. In the last decade a growing number of large-scale knowledge bases have been created online. Tools for automatically aligning these sources would make it possible to unify them in a structured knowledge and to answer complex queries. Here we present Holistic Entity Matching (HolisticEM), an algorithm based on Personalized Page Rank for aligning instances in large knowledge bases. It consists of two steps. First, a graph of potential matching pairs is constructed; second, local and global information from the relationship graph is propagated via Personalized Page Rank. We demonstrate that HolisticEM performs competitively and can efficiently handle databases with 110M and 203M entities accurately resolving 1.6M of matching entity pairs.
展开▼