首页> 外文期刊>Expert Systems with Application >Entity reconciliation in big data sources: A systematic mapping study
【24h】

Entity reconciliation in big data sources: A systematic mapping study

机译:大数据源中的实体协调:系统的制图研究

获取原文
获取原文并翻译 | 示例
           

摘要

The entity reconciliation (ER) problem aroused much interest as a research topic in today's Big Data era, full of big and open heterogeneous data sources. This problem poses when relevant information on a topic needs to be obtained using methods based on: (i) identifying records that represent the same real world entity, and (ii) identifying those records that are similar but do not correspond to the same real-world entity. ER is an operational intelligence process, whereby organizations can unify different and heterogeneous data sources in order to relate possible matches of non-obvious entities. Besides, the complexity that the heterogeneity of data sources involves, the large number of records and differences among languages, for instance, must be added. This paper describes a Systematic Mapping Study (SMS) of journal articles, conferences and workshops published from 2010 to 2017 to solve the problem described before, first trying to understand the state-of-the-art, and then identifying any gaps in current research. Eleven digital libraries were analyzed following a systematic, semiautomatic and rigorous process that has resulted in 61 primary studies. They represent a great variety of intelligent proposals that aim to solve ER. The conclusion obtained is that most of the research is based on the operational phase as opposed to the design phase, and most studies have been tested on real-world data sources, where a lot of them are heterogeneous, but just a few apply to industry. There is a clear trend in research techniques based on clustering/blocking and graphs, although the level of automation of the proposals is hardly ever mentioned in the research work. (C) 2017 Elsevier Ltd. All rights reserved.
机译:在当今的大数据时代,实体对帐(ER)问题引起了人们的极大兴趣,因为它充满了大而开放的异构数据源。当需要使用基于以下方法的主题获取有关某个主题的相关信息时,就会出现此问题:(i)标识代表相同真实世界实体的记录,以及(ii)标识相似但不对应于相同真实世界的记录世界实体。 ER是一个运营情报流程,组织可以借此统一不同的异构数据源,以关联非显而易见的实体的可能匹配项。此外,例如,必须增加数据源的异构性所涉及的复杂性,大量的记录和语言之间的差异。本文描述了2010年至2017年间发表的期刊,会议和研讨会的系统映射研究(SMS),以解决之前描述的问题,首先尝试了解最新技术,然后找出当前研究中的任何差距。经过系统,半自动和严格的过程,对11个数字图书馆进行了分析,得出了61项初步研究。它们代表了旨在解决ER的各种智能建议。得出的结论是,大多数研究是基于运营阶段而不是设计阶段的,并且大多数研究已在真实世界的数据源上进行了测试,其中许多数据是异构的,但只有少数适​​用于工业。尽管在研究工作中几乎没有提到提案的自动化水平,但是基于聚类/分块和图形的研究技术有明显的趋势。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号