Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

Aidan Hogan; Antoine Zimmermann; Juergen Umbrich; Axel Polleres; Stefan Decker

首页> 外文期刊>Journal of web semantics: >Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

【24h】

Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

机译：可扩展和分布式的方法，用于对链接的数据集进行实体匹配，合并和消歧

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (ⅰ) a baseline approach, which uses explicit owl: sameAs relations to perform consolidation; (ⅱ) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owltsameAs relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (ⅲ) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (ⅳ) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data.

机译：关于大规模的静态链接数据语料库，在本文中，我们讨论了可伸缩的分布式实体合并方法（即伪造，实体解析，对象合并等），以定位和处理表示同一实体的名称。我们研究一种基准方法，该方法使用显式owl：sameAs关系执行合并；（ⅱ）扩展的实体合并，该合并的实体另外使用OWL 2 RL / RDF规则的子集，通过反功能特性，功能特性和（最大）基数限制为1的语义来推导新颖的owltsameAs关系；（ⅲ）使用统计分析，基于共享的内联/外联和属性值，得出语料库中实体之间的加权并发度量；（ⅳ）使用OWL 2 RL / RDF规则基于不一致检测来消除（初始）合并实体的歧义。我们的方法基于语料库的分布式排序和扫描，在此我们故意避免对所有数据建立索引的需求。在整个过程中，我们对各种链接数据语料库进行评估，这些语料库由与领域无关的398.5万个RDF / XML Web文档的开放式爬网派生而来，包含11.18亿个四倍，证明了我们方法在该规模上的可行性，并提供了对质量的见解。真实数据的结果。

著录项

来源
《Journal of web semantics:》 |2012年第1期|p.76-110|共35页
作者
Aidan Hogan; Antoine Zimmermann; Juergen Umbrich; Axel Polleres; Stefan Decker;
展开▼
作者单位

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;

INSA-Lyon, LIRIS, UMR5205, Villeurbanne F-69621, France;

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;

Siemens AG Oesterreich, Siemensstrasse 90, 1210 Vienna, Austria;

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
entity consolidation; web data; linked data; RDF;

机译：实体合并;网络数据;链接数据;RDF;

相似文献

外文文献
中文文献
专利

1. Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence [J] . Weichselbraun Albert, Streiff Daniel, Scharl Arno International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2015,第2期

机译：合并异构企业数据以进行命名实体链接和Web Intelligence
2. Crowd-Guided Entity Matching with Consolidated Textual Data [J] . Zhi-Xu Li, Qiang Yang, An Liu, 计算机科学技术学报（英文版） . 2017,第005期

机译：人群指导实体与合并文本数据的匹配
3. Feature-driven linguistic-based entity matching in linked data with application in pharmacy [J] . Zadeh Parisa D. Hossein, Zadeh Mahsa D. Hossein, Reformat Marek Z. Soft computing: A fusion of foundations, methodologies and applications . 2017,第2期

机译：基于语言的基于语言的基于语言的实体与药房应用的链接数据中匹配
4. LINDA: Distributed Web-of-Data-Scale Entity Matching [C] . Christoph Boehm, Gerard de Melo, Felix Naumann, ACM international conference on information and knowledge management . 2012

机译：LINDA：分布式Web数据规模实体匹配
5. Minimization of resource consumption through workload consolidation in large-scale distributed data platforms. [D] . Kayyoor, Ashwin Kumar. 2014

机译：通过在大型分布式数据平台中进行工作负载合并来最大程度地减少资源消耗。
6. Consolidating drug data on a global scale using Linked Data [O] . Milos Jovanovik, Dimitar Trajanov 2017

机译：使用关联数据在全球范围内整合药物数据
7. Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora [O] . Aidan Hogan A, Antoine Zimmermann B, Jürgen Umbrich A, 2015

机译：链接数据语料库的实体匹配，合并和消歧的可扩展和分布式方法

Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅