首页> 外文期刊>Distributed and Parallel Databases >Resolving Entity on A Large scale: DEtermining Linked Entities and Grouping similar Attributes represented in assorted TErminologies
【24h】

Resolving Entity on A Large scale: DEtermining Linked Entities and Grouping similar Attributes represented in assorted TErminologies

机译:大规模解析实体:确定链接的实体并将各种术语中表示的相似属性分组

获取原文
获取原文并翻译 | 示例

摘要

The tremendous growth of the World Wide Web (WWW) accumulates and exposes an abundance of unresolved real-world entities that are exposed to public Web databases. Entity resolution (ER) is the vital prerequisite for leveraging and resolving Web entities that describe the same real-world objects. Data blocking is a popular method for addressing Web entities and grouping similar entity profiles without duplication. The existing ER techniques apply hierarchical blocking to ease dimensionality reduction. Canopy clustering is a pre-clustering method for increasing processing speed. However, it performs a pairwise comparison of the entities, which results in a computationally intensive process. Moreover, conventional data-blocking techniques have limited control over both the block size and overlapping blocks, despite the significance of blocking quality in many potential applications. This paper proposes a Real-Delegate (Resolving Entity on A Large scale: DEtermining Linked Entities and Grouping similar Attributes represented in assorted TErminologies) that exploits attribute-based unsupervised hierarchical blocking as well as meta-blocking without relying on pre-clustering. The proposed approach significantly improves the efficiency of the blocking function in three phases. In the initial phase, the Real-Delegate approach links the multiple sets of equivalent entity descriptions using Linked Open Data (LOD) to integrate multiple Web sources. The next phase employs attribute-based unsupervised hierarchical blocking with rough set theory (RST), which considerably reduces superfluous comparisons. Finally, the Real-Delegate approach eliminates a redundant entity by employing a graph-based meta-blocking model that represents a redundancy-positive block and removes overlapping profiles effectively. The experimental results demonstrate that the proposed approach significantly improves the effectiveness of entity resolution compared with the token blocking method in a large-scale Web dataset.
机译:万维网(WWW)的迅猛发展积累并暴露了暴露于公共Web数据库的大量未解决的现实世界实体。实体解析(ER)是利用和解析描述相同真实世界对象的Web实体的重要前提。数据阻止是一种流行的方法,用于寻址Web实体并将相似的实体配置文件分组而不重复。现有的ER技术应用分层阻塞来简化降维。冠层聚类是用于提高处理速度的预聚类方法。但是,它执行实体的成对比较,从而导致计算量大。而且,尽管在许多潜在的应用中块质量很重要,但是常规的数据块技术对块大小和重叠块的控制有限。本文提出了一种Real-Delegate(大规模解析实体:确定链接的实体并对以各种术语表示的相似属性进行分组),该方法利用了基于属性的无监督分层阻止以及元阻止,而无需依赖于预聚类。所提出的方法在三个阶段显着提高了阻止功能的效率。在初始阶段,Real-Delegate方法使用链接的开放数据(LOD)链接多组等效的实体描述,以集成多个Web源。下一阶段将使用基于粗糙集理论(RST)的基于属性的无监督分层阻止,这将大大减少多余的比较。最后,Real-Delegate方法通过采用基于图的元块模型来消除冗余实体​​,该模型表示冗余-正块并有效地去除重叠的轮廓。实验结果表明,与大规模Web数据集中的令牌阻止方法相比,该方法显着提高了实体解析的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号