Network theory has progressed a long way since the Erdös-Rényi model, identifying many important real-world phenomena that a good random graph model should capture, and producing more realistic models to capture many of them. However, these models are largely limited to the domain of simple networks - nodes and links only - leaving remaining complications outside the realm of theory. In such cases, a practitioner with complicated data is left to make decisions or apply algorithms to compensate for these issues without the benefit of an underlying model. In this paper, we develop a simple generative model of the entity resolution problem. Noting its similarity to the association problem in data fusion, we develop principled inference equations for entity resolution analogous to those developed for data association. The framework for this effort is a ground-truth model for object states and for the network which links them, together with a Dirichlet process model for how the observed aliases of the objects are distributed among the observed transactions between them. The paper focuses on the derivation of the inference equations, and the result is demonstrated on an illustrative example. Because the framework is based on rigorous probabilistic models, it is particularly well suited to ambiguous scenarios in which no single entity resolution hypothesis is stands out as the correct one.
展开▼