首页> 外文期刊>International Journal of Business Intelligence Research >Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods
【24h】

Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods

机译:使用遗传规划方法优化多个数据源的基于实体的数据集成的准确性

获取原文
获取原文并翻译 | 示例
       

摘要

Entity-based data integration (EBDI) is a form of data integration in which information related to the same real-world entity is collected and merged from different sources. It often happens that not all of the sources will agree on one value for a common attribute. These cases are typically resolved by invoking a rule that will select one of the non-null values presented by the sources. One of the most commonly used selection rides is called the naive selection operator that chooses the non-null value provided by the source with the highest overall accuracy for the attribute in question. However, the naive selection operator will not always produce the most accurate result. This paper describes a method for automatically generating a selection operator using methods from genetic programming. It also presents the results from a series of experiments using synthetic data that indicate that this method will yield a more accurate selection operator than either the naive or naive-voting selection operators.
机译:基于实体的数据集成(EBDI)是数据集成的一种形式,其中与同一真实世界实体相关的信息是从不同来源收集并合并的。经常会发生这样的情况,即并非所有来源都会就一个公共属性的一个值达成一致。这些情况通常通过调用将选择源提供的非空值之一的规则来解决。最常用的选择游乐设施之一称为“天真选择”运算符,它为相关属性选择源提供的具有最高整体精度的非空值。但是,幼稚的选择运算符不会总是产生最准确的结果。本文介绍了一种使用遗传编程方法自动生成选择算子的方法。它还提供了使用合成数据进行的一系列实验的结果,这些结果表明,与朴素或朴素的选择操作符相比,该方法将产生更准确的选择符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号