Entity resolution framework using rough set blocking for heterogeneous web of data

Vidhya K. A.; Geetha T. V.

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Entity resolution framework using rough set blocking for heterogeneous web of data

【24h】

Entity resolution framework using rough set blocking for heterogeneous web of data

机译：使用粗糙集阻塞的实体分辨率框架用于异构数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Entity Resolution (ER) is the method of resolving two similar entities used in the process of data cleaning and data integration. However, existing ER Framework lead to exhaustive pairwise comparisons. The most efficient ER method is blocking, inherently uses exponential pair-wise comparisons for the large databases, leading to poor efficiency in resolving the entities. The real world data can either be homogeneous or heterogeneous, generally of two forms, clean-clean ER which does not have any duplicates or dirty-ER which have duplicates within the dataset. Entity Resolution framework is associated with two phases namely the block building phase which construct the blocks where the similar entities are grouped into a single block for effective indexing, while the aim of block processing phase is to reduce the number of redundant pair-wise comparisons. Another perspective is handling of the entity associated with heterogeneous data, in the proposed work the block building phase aims to gather related entities with different representations into a single block with an approximation space. For this purpose semantic-dominance rough set has been used to cluster the attributes of related entities having a varied schema. The similarity between the entities associated with the clustered attributes is determined using a rough-Jaccard similarity measure, grouped to form blocks of varied, but limited size. The pair-wise comparisons between the blocks of entities are carried out only when the lower approximation of the blocks are same, determined by the proposed multi-criteria Pareto optimality, else the entities are not compared, which signifies, the overall number of pair-wise comparisons is reduced. A performance analysis of the proposed technique has been tested on four real-world, highly heterogeneous datasets, and the validation of these algorithms has yielded 99.98% effectiveness and 98.3% efficiency in block comparison when compared to token blocking and attribute clustering methods.

机译：实体分辨率（ER）是解析数据清洁和数据集成过程中使用的两个类似实体的方法。然而，现有的ER框架导致详尽的成对比较。最有效的ER方法是阻塞的，本身地使用对大型数据库的指数对比较，从而导致解决实体的效率差。现实世界数据可以是同质的或异构的，通常是两个形式，清洁清洁ER，其在数据集中没有重复的任何重复或脏-er。实体分辨率框架与两个阶段相关联，即块构建阶段，该块构建阶段构造与类似实体被分组成单个块的块，用于有效索引，而块处理阶段的目的是减少冗余配对比较的数量。另一个透视是处理与异构数据相关联的实体，在所提出的工作中，块构建阶段旨在将具有不同表示的相关实体与具有近似空间的单个块收集到单个块中。为此目的，语义主导地位粗糙集已用于聚类具有变化模式的相关实体的属性。使用粗略Jaccard相似度测量确定与群集属性相关联的实体之间的相似性，分组以形成各种变化但有限但大小的块。实体块之间的成对比较仅在块的较低近似相同时执行，由所提出的多标准Pareto最优值确定实体，否则该实体比较，这意味着，对的总数 - 明智的比较减少了。在四个现实世界，高度异构的数据集中测试了该技术的性能分析，与令牌阻塞和属性聚类方法相比，这些算法的验证产生了99.98％的有效性和98.3％的效率。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology》 |2018年第1期|共17页
作者
Vidhya K. A.; Geetha T. V.;
展开▼
作者单位

Anna Univ Dept Comp Sci Madras Tamil Nadu India;

Anna Univ Dept Comp Sci Madras Tamil Nadu India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
Entity resolution; blocking; rough set; heterogeneous data; linked open data;

机译：实体分辨率;阻塞;粗糙集;异构数据;链接开放数据;

相似文献

外文文献
中文文献
专利

1. Entity resolution framework using rough set blocking for heterogeneous web of data [J] . Vidhya K. A., Geetha T. V. Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2018,第1期

机译：使用粗糙集阻塞的实体分辨率框架用于异构数据
2. Parallel meta-blocking for scaling entity resolution over big heterogeneous data [J] . Efthymiou Vasilis, Papadakis George, Papastefanatos George, Information Systems . 2017,第APRa期

机译：并行元数据块可扩展大型异构数据的实体分辨率
3. Data Conflict Resolution among Same Entities in Web of Data [J] . Mojgan Askarizade, Mohammad Ali Nematbakhsh, Enseih Davoodi Jam BRAIN. Broad Research in Artificial Intelligence and Neurosciences . 2012,第3期

机译：Web数据中相同实体之间的数据冲突解决
4. An Ensemble Blocking Approach for Entity Resolution of Heterogeneous Datasets [C] . Janani Balaji, Faizan Javed, Chris Min, International Florida Aritificial Intelligence Research Society Conference . 2017

机译：异构数据集的实体分辨率的集合阻塞方法
5. Toward better website usage: Leveraging data mining techniques and rough set learning to construct better-to-use websites. [D] . Khasawneh, Natheer Yousef. 2005

机译：更好地使用网站：利用数据挖掘技术和粗糙集学习来构建使用更好的网站。
6. Optimized Dual Threshold Entity Resolution For Electronic Health Record Databases – Training Set Size And Active Learning [O] . Erel Joffe, Michael J. Byrne, Phillip Reeder, 2013

机译：电子病历数据库的最佳双阈值实体分辨率–训练集大小和主动学习
7. Incremental Blocking for Entity Resolution over Web Streaming Data [O] . Tiago Brasileiro Araújo, Kostas Stefanidis, Carlos Eduardo Santos Pires, 2019

机译：在Web流数据上的实体分辨率增量阻断

Entity resolution framework using rough set blocking for heterogeneous web of data

摘要

著录项

相似文献

相关主题

期刊订阅