首页> 外文期刊>International Journal of Computational Science and Engineering >A data cleaning method for heterogeneous attribute fusion and record linkage
【24h】

A data cleaning method for heterogeneous attribute fusion and record linkage

机译:异构属性融合和记录联动的数据清洁方法

获取原文
获取原文并翻译 | 示例
           

摘要

In big data era, massive heterogeneous data are generated from various data sources, the cleaning of dirty data is critical for reliable data analysis. Existing rule-based methods are generally developed in single data source environment, issues like data standardisation and duplication detection for different data type attributes, are not fully studied. In order to address these challenges, we introduce a method based on dynamic configurable rules which can integrate data detection, modification and transformation together. Secondly, we propose a type-based blocking and a varying window size selection mechanism based on classic sorted-neighbourhood algorithm. We present a reference implementation of our method in a real-life data fusion system and validate its effectiveness and efficiency using recall and precision metrics. Experimental results indicate that our method is suitable in the scenario of multiple data sources with heterogeneous attribute properties.
机译:在大数据时代,来自各种数据源产生的大规模异构数据,脏数据的清洁对于可靠的数据分析至关重要。 基于规则的方法通常在单个数据源环境中开发,没有完全研究数据标准化和不同数据类型属性的数据标准化和复制检测的问题。 为了解决这些挑战,我们介绍了一种基于动态可配置规则的方法,可以将数据检测,修改和转换集成在一起。 其次,我们提出了一种基于类型的阻塞和基于经典分类邻域算法的不同窗口尺寸选择机制。 我们在现实生活数据融合系统中提到了我们的方法,并使用召回和精密度量来验证其有效性和效率。 实验结果表明,我们的方法适用于具有异构属性属性的多个数据源的场景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号