首页> 外文会议>International Conference on Information Fusion >Incremental entity fusion from linked documents
【24h】

Incremental entity fusion from linked documents

机译:链接文档中的增量实体融合

获取原文

摘要

In many government applications, especially for intelligence and law-enforcement, we often find that information about entities, such as persons or even companies, are available in disparate data sources. For example, information distributed across passports, driving licences, bank accounts, and income tax documents that need to be resolved and fused to reveal a consolidated profile of an individual. In this paper we describe an algorithm to fuse documents that are highly likely to belong to the same entity by exploiting inter-document references in addition to attribute similarity. Our technique uses a combination of iterative graph-traversal, locality-sensitive hashing, iterative match-merge, and graph-clustering to discover unique entities based on a document corpus. Further, new sets of documents can be added incrementally while having to re-process only a small subset of a previously fused entity-document collection. We present performance and quality results via both Bayesian likelihood fusion as well as using Support Vector Machines to demonstrate benefit of using inter-document references, both to improve accuracy as well as for detecting attempts at deliberate obfuscation.
机译:在许多政府应用程序中,特别是在情报和执法方面,我们经常发现有关实体(例如个人甚至公司)的信息可在不同的数据源中获得。例如,跨护照,驾驶执照,银行帐户和所得税文件分发的信息需要解决和融合以显示个人的综合资料。在本文中,我们描述了一种算法,除了属性相似性之外,还通过利用文档间引用来融合很可能属于同一实体的文档。我们的技术结合使用了迭代图遍历,局部敏感哈希,迭代匹配合并和图聚类来发现基于文档语料库的唯一实体。此外,可以递增地添加新的文档集,而仅需重新处理先前融合的实体文档集合的一小部分。我们通过贝叶斯似然融合以及使用支持向量机来展示性能和质量结果,以证明使用文档间引用的好处,既可以提高准确性,也可以检测故意混淆的尝试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号