首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Efficient Techniques for Online Record Linkage
【24h】

Efficient Techniques for Online Record Linkage

机译:在线记录链接的有效技术

获取原文
获取原文并翻译 | 示例

摘要

The need to consolidate the information contained in heterogeneous data sources has been widely documented in recent years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems, especially the entity heterogeneity problem that arises when the same real-world entity type is represented using different identifiers in different data sources. Statistical record linkage techniques could be used for resolving this problem. However, the use of such techniques for online record linkage could pose a tremendous communication bottleneck in a distributed environment (where entity heterogeneity problems are often encountered). In order to resolve this issue, we develop a matching tree, similar to a decision tree, and use it to propose techniques that reduce the communication overhead significantly, while providing matching decisions that are guaranteed to be the same as those obtained using the conventional linkage technique. These techniques have been implemented, and experiments with real-world and synthetic databases show significant reduction in communication overhead.
机译:近年来,已广泛记录了整合异构数据源中包含的信息的需求。为了实现此目标,组织必须解决几种类型的异质性问题,尤其是当在不同数据源中使用不同的标识符表示同一真实世界实体类型时出现的实体异质性问题。统计记录链接技术可用于解决此问题。但是,将此类技术用于在线记录链接可能会在分布式环境(经常遇到实体异质性问题)中造成巨大的通信瓶颈。为了解决此问题,我们开发了类似于决策树的匹配树,并使用它来提出可显着减少通信开销的技术,同时提供与使用传统链接获得的匹配决策相同的匹配决策。技术。已经实现了这些技术,并且在真实世界和综合数据库中进行的实验表明,通信开销显着减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号