首页> 外文会议>IEEE International Conference on Software Engineering and Service Science >Entity Matching Using Different Level Similarity for Different Attributes
【24h】

Entity Matching Using Different Level Similarity for Different Attributes

机译:使用不同属性的不同级别相似性​​的实体匹配

获取原文

摘要

Entity matching (EM) recognizes records in one or different databases that represent the same entity in real-world. It is an essential part of data cleaning and data integration. Most existing studies determine whether two records match by calculating the similarity of the corresponding attribute values in the two records. Then series of similarity metrics are proposed. But according to our investigation, we found that no one has considered combining semantic level similarity with string level similarity. In some data tables, some attributes values are long text, such as product descriptions, and they are more semantically similar in different records that refer to a same entity. Other numeric or noun attributes, such as price and person name, are more suitable for string level similarity calculations. Therefore, we propose a model to calculate the similarity of the two types of attributes using modules of different similarity levels, and assign them different weights. Learning by labeled data, we get a model that can effectively solve the entity matching task.
机译:实体匹配(EM)识别一个或不同的数据库中的记录,该数据库代表现实世界中的同一实体。它是数据清洁和数据集成的重要组成部分。大多数现有研究通过计算两个记录中的相应属性值的相似性来确定两个记录是否匹配。然后提出了一系列相似度量。但根据我们的调查,我们发现没有人认为与字符串级别相似性​​相结合的语义水平相似度。在某些数据表中,某些属性值是长文本,例如产品描述,它们在引用同一实体的不同记录中更为类似。其他数字或名词属性,例如价格和人称,更适合字符串级相似性计算。因此,我们提出了一种模型来计算使用不同相似度级别的模块的两种类型属性的相似性,并为它们分配不同的权重。通过标记数据学习,我们得到一个可以有效解决实体匹配任务的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号