首页> 外文期刊>Journal of web semantics: >Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora
【24h】

Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

机译:可扩展和分布式的方法,用于对链接的数据集进行实体匹配,合并和消歧

获取原文
获取原文并翻译 | 示例

摘要

With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (ⅰ) a baseline approach, which uses explicit owl: sameAs relations to perform consolidation; (ⅱ) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owltsameAs relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (ⅲ) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (ⅳ) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data.
机译:关于大规模的静态链接数据语料库,在本文中,我们讨论了可伸缩的分布式实体合并方法(即伪造,实体解析,对象合并等),以定位和处理表示同一实体的名称。我们研究一种基准方法,该方法使用显式owl:sameAs关系执行合并; (ⅱ)扩展的实体合并,该合并的实体另外使用OWL 2 RL / RDF规则的子集,通过反功能特性,功能特性和(最大)基数限制为1的语义来推导新颖的owltsameAs关系; (ⅲ)使用统计分析,基于共享的内联/外联和属性值,得出语料库中实体之间的加权并发度量; (ⅳ)使用OWL 2 RL / RDF规则基于不一致检测来消除(初始)合并实体的歧义。我们的方法基于语料库的分布式排序和扫描,在此我们故意避免对所有数据建立索引的需求。在整个过程中,我们对各种链接数据语料库进行评估,这些语料库由与领域无关的398.5万个RDF / XML Web文档的开放式爬网派生而来,包含11.18亿个四倍,证明了我们方法在该规模上的可行性,并提供了对质量的见解。真实数据的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号