Entity Matching Using Different Level Similarity for Different Attributes

机译：使用不同属性的不同级别相似性的实体匹配

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Entity matching (EM) recognizes records in one or different databases that represent the same entity in real-world. It is an essential part of data cleaning and data integration. Most existing studies determine whether two records match by calculating the similarity of the corresponding attribute values in the two records. Then series of similarity metrics are proposed. But according to our investigation, we found that no one has considered combining semantic level similarity with string level similarity. In some data tables, some attributes values are long text, such as product descriptions, and they are more semantically similar in different records that refer to a same entity. Other numeric or noun attributes, such as price and person name, are more suitable for string level similarity calculations. Therefore, we propose a model to calculate the similarity of the two types of attributes using modules of different similarity levels, and assign them different weights. Learning by labeled data, we get a model that can effectively solve the entity matching task.

机译：实体匹配（EM）识别一个或不同的数据库中的记录，该数据库代表现实世界中的同一实体。它是数据清洁和数据集成的重要组成部分。大多数现有研究通过计算两个记录中的相应属性值的相似性来确定两个记录是否匹配。然后提出了一系列相似度量。但根据我们的调查，我们发现没有人认为与字符串级别相似性相结合的语义水平相似度。在某些数据表中，某些属性值是长文本，例如产品描述，它们在引用同一实体的不同记录中更为类似。其他数字或名词属性，例如价格和人称，更适合字符串级相似性计算。因此，我们提出了一种模型来计算使用不同相似度级别的模块的两种类型属性的相似性，并为它们分配不同的权重。通过标记数据学习，我们得到一个可以有效解决实体匹配任务的模型。

著录项

来源
《IEEE International Conference on Software Engineering and Service Science》|2018年|579p|共4页
会议地点
作者
Guochao Song; Lei Zhang; Pengfei Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机软件;
关键词
Semantics; Numerical models; Training; Databases; Data integration; Vocabulary; Telecommunications;

机译：语义;数值模型;培训;数据库;数据集成;词汇;电信;

相似文献

外文文献
中文文献
专利

1. Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity [J] . Vishnu Unnikrishnan, Christian Beyer, Pawel Matuszyk, International Journal of Data Science and Analytics . 2020,第1期

机译：实体级流分类：利用实体相似性来标记涉及实体的未来观察结果
2. Integrating Entity and Attribute for Object Similarity [J] . Rui Xie, Zhifeng Hao, Bo Liu The Open Automation and Control Systems Journal . 2016,第1期

机译：集成实体和属性以实现对象相似性
3. Quality-aware similarity assessment for entity matching in Web data [J] . Surender Reddy Yerva, Zoltan Miklos, Karl Aberer Information Systems . 2012,第4期

机译：Web数据中实体匹配的质量感知相似性评估
4. Entity Matching Using Different Level Similarity for Different Attributes [C] . Guochao Song, Lei Zhang, Pengfei Wang IEEE International Conference on Software Engineering and Service Science . 2018

机译：对不同属性使用不同级别相似度的实体匹配
5. Multi-filter String Matching and Human-centric Entity Matching for Information Extraction. [D] . Sun, Chong. 2012

机译：用于信息提取的多过滤器字符串匹配和以人为中心的实体匹配。
6. Wide-scope biomedical named entity recognition and normalization with CRFs fuzzy matching and character level modeling [O] . Suwisa Kaewphan, Kai Hakala, Niko Miekka, 2018

机译：具有CRF模糊匹配和字符级建模的宽范围生物医学命名实体识别和归一化
7. Visual Query Answering by Entity-Attribute Graph Matching and Reasoning [O] . Peixi Xiong, Huayi Zhan, Xin Wang, 2019

机译：Entity-Attribute图形匹配和推理的视觉查询应答
8. Trust Framework for Health Information Exchange: A Framework for Governing Entities and their Participants to Share Trust Attributes to Support Exchange with a Group of Unaffiliated Entities. Nationa HIE Governance Forum. [R] . 2013

机译：健康信息交换信托框架：管理实体及其参与者共享信任属性以支持与一组无关联实体交换的框架。 Nationa HIE治理论坛。

Entity Matching Using Different Level Similarity for Different Attributes

摘要

著录项

相似文献

相关主题

期刊订阅