Application of clustering and association methods in data cleaning

机译：聚类和关联方法在数据清理中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data cleaning is a process of maintaining data quality in information systems. Current data cleaning solutions require reference data to identify incorrect or duplicate entries. This article proposes usage of data mining in the area of data cleaning as effective in discovering reference data and validation rules from the data itself. Two algorithms designed by the author for data attribute correction have been presented. Both algorithms utilize data mining methods. Experimental results show that both algorithms can effectively clean text attributes without external reference data.

机译：数据清理是维护信息系统中数据质量的过程。当前的数据清理解决方案需要参考数据来识别不正确或重复的条目。本文提出在数据清理领域中使用数据挖掘可有效地从数据本身中发现参考数据和验证规则。提出了作者设计的两种用于数据属性校正的算法。两种算法都利用数据挖掘方法。实验结果表明，两种算法都可以有效地清除文本属性，而无需外部参考数据。

著录项

来源
《Computer Science and Information Technology, 2008 International Multiconference on》||P.97-103|共7页
会议地点
作者
Ciszak Lukasz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. Attribute Correction - Data Cleaning Using Association Rule and Clustering Methods [J] . R.Kavitha Kumar, RM.Chadrasekaran International Journal of Data Mining & Knowledge Management Process . 2011,第2期

机译：属性校正-使用关联规则和聚类方法的数据清理
2. Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study [J] . Hyunki Woo, Kyunga Kim, KyeongMin Cha, Journal of medical Internet research . 2019,第1期

机译：使用文本聚类的高效数据清理将半结构化医疗报告应用于大规模粪便检查报告：方法学研究
3. Oversampling Methods Combined Clustering and Data Cleaning for Imbalanced Network Data [J] . Yang Yang, Zhao Qian, Ruan Linna, Intelligent automation and soft computing . 2020,第5期

机译：过采样方法组合聚类和数据清洁以进行不平衡网络数据
4. Application of clustering and association methods in data cleaning [C] . Ciszak Lukasz International Multiconference on Computer Science and Information Technology . 2008

机译：聚类和关联方法在数据清洁中的应用
5. Methods for analyzing high dimensional data: Classification, measurement error model and graph based association measures, with applications to microarray data [D] . Ding, Beiying 2004

机译：分析高维数据的方法：分类，测量误差模型和基于图的关联度量，并应用于微阵列数据
6. Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata [O] . Wei Hu, Amrapali Zaveri, Honglei Qiu, 2017

机译：集群清洗：解决生物医学元数据中数据质量问题的方法
7. Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen? [O] . D.-T. Phan, P. Leray, C. Sinoquet 2015

机译：用潜在树木的森林建模血液数据，以便在大规模的结合遗传学中的应用 - 应选择哪种聚类方法？
8. Graph Theory Derived Methods for the Study of Metal Cluster Bonding Topology: Applications to Post-Transition Metal Clusters [R] . King, R. B. 1986

机译：图论导出的金属团簇键合拓扑研究方法：后过渡金属团簇的应用

Application of clustering and association methods in data cleaning

摘要

著录项

相似文献

相关主题

期刊订阅