【24h】

A Study of K-Nearest Neighbour as an Imputation Method

机译:K最近邻作为插补方法的研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Data quality is a major concern in Machine Learning and other correlated areas such as Knowledge Discovery from Databases (KDD). As most Machine Learning algorithms induce knowledge strictly from data, the quality of the knowledge extracted is largely determined by the quality of the underlying data. One relevant problem in data quality is the presence of missing data. Despite the frequent occurrence of missing data, many Machine Learning algorithms handle missing data in a rather naive way. Missing data treatment should be carefully thought, otherwise bias might be introduced into the knowledge induced. In this work, we analyse the use of the k-nearest neighbour as an imputation method. Imputation is a term that denotes a. procedure that replaces the missing values in a data set by some plausible values. Our analysis indicates that missing data imputation based on the k-nearest neighbour algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data.
机译:数据质量是机器学习和其他相关领域(例如,数据库知识发现(KDD))中的主要问题。由于大多数机器学习算法严格地从数据中诱导知识,因此提取的知识的质量在很大程度上取决于基础数据的质量。数据质量中的一个相关问题是缺少数据。尽管经常出现丢失数据,但是许多机器学习算法还是以一种非常幼稚的方式处理丢失数据。应当认真考虑缺少数据的处理方式,否则可能会在引入的知识中引入偏见。在这项工作中,我们分析了使用k最近邻作为插补方法。插补是表示a的术语。用一些合理的值替换数据集中的缺失值的过程。我们的分析表明,基于k最近邻算法的缺失数据插补可以胜过C4.5和CN2用于处理缺失数据的内部方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号