首页> 外文期刊>Neurocomputing >Feature selection with missing data using mutual information estimators
【24h】

Feature selection with missing data using mutual information estimators

机译:使用互信息估计器选择缺少数据的特征

获取原文
获取原文并翻译 | 示例

摘要

Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high.
机译:对于许多机器学习和模式识别应用程序(包括回归和分类),特征选择是一项重要的预处理任务。在许多实际问题中都会遇到丢失数据的情况,因此必须在实践中加以考虑。本文解决了在预测问题中缺少某些特征出现的特征选择问题。为此,使用众所周知的互信息准则。更精确地,示出了如何可以扩展最近引入的基于最近邻居的互信息估计器以处理丢失的数据。与传统估计器相比,该估计器的优势在于它不直接估计任何概率密度函数。因此,即使空间的尺寸增加,也可以可靠地估计共有信息。人工和真实数据集上的结果表明,在完全丢失随机数据的假设下,该方法无需任何插补算法即可选择重要特征。此外,实验表明,在估算数据之前选择特征通常会提高预测模型的精度,尤其是在丢失数据的比例很高的情况下。

著录项

  • 来源
    《Neurocomputing》 |2012年第2012期|3-11|共9页
  • 作者单位

    Machine Learning Group-ICTEAM, Universite catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium;

    Machine Learning Group-ICTEAM, Universite catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    feature selection; missing data; mutual information;

    机译:特征选择;缺失数据;共同信息;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号