Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning

Paresh V Virparia; Sohil Dineshkumar Pandya

首页> 外文期刊>International Journal of Computer Science and Security >Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning

【24h】

Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning

机译：在上下文无关的数据清理中使用聚类方法测试各种相似性指标及其排列

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Organizations can sustain growth in this knowledge era by proficient data analysis, which heavily relies on quality of data. This paper emphasizes on usage of sequence similarity metric with clustering approach in context free data cleaning to improve the quality of data by reducing noise. Authors propose an algorithm to test suitability of value to correct other values of attribute based on distance between them. The sequence similarity metrics like Needlemen-Wunch, Jaro-Winkler, Chapman Ordered Name Similarity and Smith-Waterman are used to find distance of two values. Experimental results show that how the approach can effectively clean the data without reference data.

机译：组织可以通过高度依赖数据质量的熟练数据分析来维持此知识时代的增长。本文重点介绍了使用序列相似性度量和聚类方法进行上下文无关的数据清理，以通过减少噪声来提高数据质量。作者提出了一种算法来测试值的适用性，以根据属性之间的距离来校正其他属性值。诸如Needlemen-Wunch，Jaro-Winkler，Chapman有序名称相似性和Smith-Waterman之类的序列相似性度量用于查找两个值的距离。实验结果表明，该方法可以有效地清除没有参考数据的数据。

著录项

来源
《International Journal of Computer Science and Security》 |2009年第5期|共页
作者
Paresh V Virparia; Sohil Dineshkumar Pandya;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number [J] . Cheung Y.-M., Jia H. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第8期

机译：基于统一相似性度量的分类和数字属性数据聚类，而无需知道聚类编号
2. Context Free Data Cleaning and its Application in Mechanism for Suggestive Data Cleaning [J] . Sohil D. Pandya 1, Paresh V. Virparia 2 International Journal of Information Science . 2011,第1期

机译：上下文无关数据清理及其在暗示数据清理机制中的应用
3. A Weighted Distance Metric Clustering Method to Cluster Small Data Points from a Projected Database Generated from a Freespan Algorithm [J] . S. Gayathri, M. Mary Metilda, S. Sanjai Babu Indian Journal of Science and Technology . 2015,第22期

机译：一种基于Freespan算法生成的投影数据库中的小数据点的加权距离度量聚类方法
4. Improving the List of Clustered Permutation on Metric Spaces for Similarity Searching on Secondary Memory [C] . Karina Figueroa, Nora Reyes, Antonio Camarena-Ibarrola, Mexican conference on pattern recognition . 2018

机译：改进度量空间上聚类排列的列表，以便在二级存储器上进行相似性搜索
5. Effects of similarity metrics on document clustering. [D] . Veni, Rushikesh. 2009

机译：相似性指标对文档聚类的影响。
6. Exact-Permutation Based Sign Tests for Clustered Binary Data via Weighted and Unweighted Test Statistics [O] . Janie McDonald, Patrick D. Gerard, Christopher S. McMahan, -1

机译：通过加权和非加权测试统计量对群集二进制数据进行基于精确置换的符号测试
7. Exact-Permutation-Based Sign Tests for Clustered Binary Data Via Weighted and Unweighted Test Statistics [O] . Janie McDonald, Patrick D. Gerard, Christopher S. McMahan, 2016

机译：通过加权和未加权测试统计信息的基于群集二进制数据的基于精确置换的符号测试

Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning

摘要

著录项

相似文献

相关主题

期刊订阅