A Study of K-Nearest Neighbour as an Imputation Method

机译：K最近邻作为插补方法的研究

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data quality is a major concern in Machine Learning and other correlated areas such as Knowledge Discovery from Databases (KDD). As most Machine Learning algorithms induce knowledge strictly from data, the quality of the knowledge extracted is largely determined by the quality of the underlying data. One relevant problem in data quality is the presence of missing data. Despite the frequent occurrence of missing data, many Machine Learning algorithms handle missing data in a rather naive way. Missing data treatment should be carefully thought, otherwise bias might be introduced into the knowledge induced. In this work, we analyse the use of the k-nearest neighbour as an imputation method. Imputation is a term that denotes a. procedure that replaces the missing values in a data set by some plausible values. Our analysis indicates that missing data imputation based on the k-nearest neighbour algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data.

机译：数据质量是机器学习和其他相关领域（例如，数据库知识发现（KDD））中的主要问题。由于大多数机器学习算法严格地从数据中诱导知识，因此提取的知识的质量在很大程度上取决于基础数据的质量。数据质量中的一个相关问题是缺少数据。尽管经常出现丢失数据，但是许多机器学习算法还是以一种非常幼稚的方式处理丢失数据。应当认真考虑缺少数据的处理方式，否则可能会在引入的知识中引入偏见。在这项工作中，我们分析了使用k最近邻作为插补方法。插补是表示a的术语。用一些合理的值替换数据集中的缺失值的过程。我们的分析表明，基于k最近邻算法的缺失数据插补可以胜过C4.5和CN2用于处理缺失数据的内部方法。

著录项

来源
《Second International Conference on Hybrid Intelligent Systems Dec 1-4, 2002 Santiago de Chile》|2002年|p.251-260|共10页
会议地点 Santiago(CL)
作者
Gustavo E. A. P. A. Batista; Maria Carolina Monard;
展开▼
作者单位

University of Sao Paulo -USP Institute of Mathematics and Computer Science -ICMC Department of Computer Science and Statistics -SCE Laboratory of Computational Intelligence -LABIC P. O. Box 668, 13560-970 -Sao Carlos, SP, Brazil;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns [J] . Silva-Ramireza Esther-Lydia, Pino-Mejias Rafael, Lopez-Coello Manuel Applied Soft Computing . 2015,第Null期

机译：带有多层感知器的单插补和结合多层感知器和k近邻的多重插补的单调模式
2. How distance metrics influence missing data imputation with k-nearest neighbours [J] . Miriam Seoane Santos, Pedro Henriques Abreu, Szymon Wilk, Pattern recognition letters . 2020,第Auga期

机译：距离指标如何影响k-intele邻居缺少数据估算
3. Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach [J] . Erica Tavazzi, Sebastian Daberdaku, Rosario Vasta, BMC Medical Informatics and Decision Making . 2020,第5期

机译：利用自适应k-最近邻居方法利用静态和动态混合型临床数据的载体的互信
4. An improved k-nearest neighbours method for traffic time series imputation [C] . Bin Sun, Liyao Ma, Wei Cheng, Chinese Automation Congress . 2017

机译：交通时间序列归因的一种改进的k近邻算法
5. Comparative classification of prostate cancer data using the Support Vector Machine, Random Forest, DualKS and k-Nearest Neighbours. [D] . Sakouvogui, Kekoura. 2015

机译：使用支持向量机，Random Forest，DualKS和k-Nearest邻居对前列腺癌数据进行比较分类。
6. Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach [O] . Erica Tavazzi, Sebastian Daberdaku, Rosario Vasta, 2020

机译：利用自适应k-最近邻居方法利用静态和动态混合型临床数据的载体的互信
7. Estimating individual tree growth with the k-nearest neighbour and k-Most Similar Neighbour methods [O] . Sironen, Susanna, Kangas, Annika, Maltamo, Matti, 2001

机译：用k最近邻居和k最相似邻居方法估计单个树的生长

A Study of K-Nearest Neighbour as an Imputation Method

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅