EnAli: entity alignment across multiple heterogeneous data sources

Kong Chao; Gao Ming; Xu Chen; Fu Yunbin; Qian Weining; Zhou Aoying

首页> 外文期刊>Frontiers of computer science in China >EnAli: entity alignment across multiple heterogeneous data sources

【24h】

EnAli: entity alignment across multiple heterogeneous data sources

机译：EnAli：跨多个异构数据源的实体对齐

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others. Identifying entities across heterogeneous data sources is paramount to many research fields, such as data cleaning, data integration, information retrieval and machine learning. The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources, but also need to handle heterogeneous entity attributes. In this paper, we propose an unsupervised approach, called EnAli, to match entities across two or more heterogeneous data sources. EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family, handle missing values, and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process. EnAli is highly accurate and efficient even without any ground-truth tuples. We illustrate the performance of EnAli on re-identifying entities from the same data source, as well as aligning entities across three real data sources. Our experimental results manifest that our proposed approach outperforms the comparable baseline.

机译：实体对齐是确定数据源中哪些实体引用其他实体中的同一真实世界实体的问题。跨异构数据源识别实体对于许多研究领域至关重要，例如数据清洁，数据集成，信息检索和机器学习。对齐过程不仅对于大型数据源而言是极其昂贵的，因为它涉及来自两个或多个数据源的所有元组，而且还需要处理异构实体属性。在本文中，我们提出了一种称为EnAli的无监督方法，以匹配两个或多个异构数据源中的实体。 EnAli使用生成概率模型，通过使用指数族来合并异构实体属性，处理缺失值，还利用局部敏感的哈希模式来减少候选元组并加快对齐过程。即使没有任何实际的元组，EnAli还是高度准确和高效的。我们将说明EnAli在重新标识同一数据源中的实体以及在三个真实数据源中对齐实体时的性能。我们的实验结果表明，我们提出的方法优于可比较的基准。

著录项

来源
《Frontiers of computer science in China》 |2019年第1期|157-169|共13页
作者
Kong Chao; Gao Ming; Xu Chen; Fu Yunbin; Qian Weining; Zhou Aoying;
展开▼
作者单位

East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China;

East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China;

Tech Univ Berlin, D-10623 Berlin, Germany;

East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China;

East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China;

East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
entity alignment; exponential family; locality sensitive hashing; EM-algorithm;

机译：实体对齐;指数族;局部敏感哈希;EM算法;

相似文献

外文文献
中文文献
专利

1. Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources [J] . CHUN-JEN LEE, JASON S. CHANG, JYH-SHING R. JANG ACM transactions on Asian language information processing . 2006,第2期

机译：使用统计模型和多个知识源的平行语料库中双语实体的对齐
2. Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods [J] . Yinle Zhou, Ali Kooshesh, John R. Talburt International Journal of Business Intelligence Research . 2012,第1期

机译：使用遗传规划方法优化多个数据源的基于实体的数据集成的准确性
3. Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization [J] . Huimin Zhao, Sudha Ram Data & Knowledge Engineering . 2008,第3期

机译：跨异构数据源的实体匹配：基于约束级联泛化的方法
4. Entity Matching Across Multiple Heterogeneous Data Sources [C] . Chao Kong, Ming Gao, Chen Xu, International conference on database systems for advanced applications . 2016

机译：跨多个异构数据源的实体匹配
5. Ranking entities in heterogeneous multiple relation Social Networks using random walks. [D] . Sangi, Farzad. 2011

机译：使用随机游走对异构多关系社交网络中的实体进行排名。
6. Database Methods and Delivery. Integrating Heterogeneous Resources: The Integrated Academic Information Management Systems (IAIMS): BioSYNTHESIS: Integrating Multiple Databases into a Virtual Database [O] . Naomi C. Broering, Helen Bagdoyan, Jeffrey Hylton, 1989

机译：数据库方法和交付。集成异构资源：集成学术信息管理系统（IAIMS）：BioSYNTHESIS：将多个数据库集成到虚拟数据库中
7. Scalable and Domain-Independent Entity Coreference: Establishing High Quality Data Linkages Across Heterogeneous Data Sources [O] . Dezhao Song, Advisor Professor, Jeff Heflin 2015

机译：可扩展且与域无关的实体共享：跨异构数据源建立高质量数据链接

EnAli: entity alignment across multiple heterogeneous data sources

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅