Large-scale linked data integration using probabilistic reasoning and crowdsourcing

Gianluca Demartini; Djellel Eddine Difallah; Philippe Cudre-Mauroux

首页> 外文期刊>The VLDB journal >Large-scale linked data integration using probabilistic reasoning and crowdsourcing

【24h】

Large-scale linked data integration using probabilistic reasoning and crowdsourcing

机译：使用概率推理和众包的大规模链接数据集成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We tackle the problems of semiautomatically matching linked data sets and of linking large collections of Web pages to linked data. Our system, ZenCrowd, (1) uses a three-stage blocking technique in order to obtain the best possible instance matches while minimizing both computational complexity and latency, and (2) identifies entities from natural language text using state-of-the-art techniques and automatically connects them to the linked open data cloud. First, we use structured inverted indices to quickly find potential candidate results from entities that have been indexed in our system. Our system then analyzes the candidate matches and refines them whenever deemed necessary using computationally more expensive queries on a graph database. Finally, we resort to human computation by dynamically generating crowdsourcing tasks in case the algorithmic components fail to come up with convincing results. We integrate all results from the inverted indices, from the graph database and from the crowd using a probabilistic framework in order to make sensible decisions about candidate matches and to identify unreliable human workers. In the following, we give an overview of the architecture of our system and describe in detail our novel three-stage blocking technique and our probabilistic decision framework. We also report on a series of experimental results on a standard data set, showing that our system can achieve a 95 % average accuracy on instance matching (as compared to the initial 88 % average accuracy of the purely automatic baseline) while drastically limiting the amount of work performed by the crowd. The experimental evaluation of our system on the entity linking task shows an average relative improvement of 14 % over our best automatic approach.

机译：我们解决了半自动匹配链接数据集以及将大量Web页面链接到链接数据的问题。我们的系统ZenCrowd（1）使用三阶段阻塞技术，以在尽可能降低实例计算复杂度和延迟的同时获得最佳实例匹配，以及（2）使用最新技术从自然语言文本中识别实体技术并将它们自动连接到链接的开放数据云。首先，我们使用结构化的倒排索引来快速找到已在我们系统中建立索引的实体的潜在候选结果。然后，我们的系统分析候选匹配项，并在认为必要时使用图形数据库上计算上更昂贵的查询对它们进行优化。最后，在算法组件无法得出令人信服的结果的情况下，我们通过动态生成众包任务来求助于人工计算。我们使用概率框架整合来自倒排索引，图形数据库和人群的所有结果，以便对候选人匹配做出明智的决策并识别不可靠的人工工人。在下文中，我们概述了系统的体系结构，并详细描述了我们新颖的三阶段阻塞技术和概率决策框架。我们还在标准数据集上报告了一系列实验结果，表明我们的系统在实例匹配方面可以达到95％的平均准确度（与之相比，纯自动基准的最初88％的平均准确度）人群完成的工作。对我们的系统进行的实体链接任务的实验评估表明，与我们最佳的自动方法相比，平均相对改进了14％。

著录项

来源
《The VLDB journal》 |2013年第5期|665-687|共23页
作者
Gianluca Demartini; Djellel Eddine Difallah; Philippe Cudre-Mauroux;
展开▼
作者单位

eXascale Infolab, University of Fribourg, Fribourg, Switzerland;

eXascale Infolab, University of Fribourg, Fribourg, Switzerland;

eXascale Infolab, University of Fribourg, Fribourg, Switzerland;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Instance matching; Entity linking; Data integration; Crowdsourcing; Probabilistic reasoning;

机译：实例匹配;实体链接;数据整合;众包;概率推理;

相似文献

外文文献
中文文献
专利

1. Large-scale linked data integration using probabilistic reasoning and crowdsourcing [J] . Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux The VLDB Journal . 2013,第5期

机译：使用概率推理和众包的大规模链接数据集成
2. MidSeml: A Middleware for Semantic Integration of Business Data with Large-scale Social and Linked Data [J] . Samir Sellami, Taoufiq Dkaki, Nacer Eddine Zarour, International journal of information system modeling and design . 2019,第2期

机译：MidSeml：中间件，用于业务数据与大规模社交和链接数据的语义集成
3. Large-scale Semantic Integration of Linked Data: A Survey [J] . Mountantonakis Michalis, Tzitzikas Yannis ACM Computing Surveys . 2020,第5期

机译：大规模的联系数据的语义集成：调查
4. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking [C] . Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudre-Mauroux Proceedings of the 21st annual conference on world wide web . 2012

机译：ZenCrowd：利用概率推理和众包技术进行大规模实体链接
5. Distributed RDF query processing and reasoning for Big Data Linked Data. [D] . Perasani, Anudeep. 2014

机译：大数据链接数据的分布式RDF查询处理和推理。
6. The COUGHVID crowdsourcing dataset a corpus for the study of large-scale cough analysis algorithms [O] . Lara Orlandic, Tomas Teijeiro, David Atienza 2021

机译：CoughVid众包数据集一个用于研究大型咳嗽分析算法的语料库
7. Large-scale linked data integration using probabilistic reasoning and crowdsourcing [O] . Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux 2013

机译：使用概率推理和众包进行大规模链接数据集成

Large-scale linked data integration using probabilistic reasoning and crowdsourcing

摘要

著录项

相似文献

相关主题

期刊订阅