...
首页> 外文期刊>International Journal of Computer Science and Security >Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning
【24h】

Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning

机译:在上下文无关的数据清理中使用聚类方法测试各种相似性指标及其排列

获取原文
           

摘要

Organizations can sustain growth in this knowledge era by proficient data analysis, which heavily relies on quality of data. This paper emphasizes on usage of sequence similarity metric with clustering approach in context free data cleaning to improve the quality of data by reducing noise. Authors propose an algorithm to test suitability of value to correct other values of attribute based on distance between them. The sequence similarity metrics like Needlemen-Wunch, Jaro-Winkler, Chapman Ordered Name Similarity and Smith-Waterman are used to find distance of two values. Experimental results show that how the approach can effectively clean the data without reference data.
机译:组织可以通过高度依赖数据质量的熟练数据分析来维持此知识时代的增长。本文重点介绍了使用序列相似性度量和聚类方法进行上下文无关的数据清理,以通过减少噪声来提高数据质量。作者提出了一种算法来测试值的适用性,以根据属性之间的距离来校正其他属性值。诸如Needlemen-Wunch,Jaro-Winkler,Chapman有序名称相似性和Smith-Waterman之类的序列相似性度量用于查找两个值的距离。实验结果表明,该方法可以有效地清除没有参考数据的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号