首页> 外文期刊>Neurocomputing >Theoretical foundations of forward feature selection methods based on mutual information
【24h】

Theoretical foundations of forward feature selection methods based on mutual information

机译:基于互信息的前向特征选择方法的理论基础

获取原文
获取原文并翻译 | 示例
       

摘要

Feature selection problems arise in a variety of applications, such as microarray analysis, clinical prediction, text categorization, image classification and face recognition, multi-label learning, and classification of internet traffic. Among the various classes of methods, forward feature selection methods based on mutual information have become very popular and are widely used in practice. However, comparative evaluations of these methods have been limited by being based on specific datasets and classifiers. In this paper, we develop a theoretical framework that allows evaluating the methods based on their theoretical properties. Our framework is grounded on the properties of the target objective function that the methods try to approximate, and on a novel categorization of features, according to their contribution to the explanation of the class; we derive upper and lower bounds for the target objective function and relate these bounds with the feature types. Then, we characterize the types of approximations taken by the methods, and analyze how these approximations cope with the good properties of the target objective function. Additionally, we develop a distributional setting designed to illustrate the various deficiencies of the methods, and provide several examples of wrong feature selections. Based on our work, we identify clearly the methods that should be avoided, and the methods that currently have the best performance. (C) 2018 Elsevier B.V. All rights reserved.
机译:特征选择问题出现在各种应用中,例如微阵列分析,临床预测,文本分类,图像分类和面部识别,多标签学习以及互联网流量分类。在各种方法中,基于互信息的前向特征选择方法已经非常流行并在实践中被广泛使用。但是,这些方法的比较性评估由于基于特定的数据集和分类器而受到限制。在本文中,我们建立了一个理论框架,可以根据其理论特性评估这些方法。我们的框架基于该方法试图逼近的目标目标函数的属性,以及根据特征对类的解释做出的新颖的特征分类;我们导出目标目标函数的上限和下限,并将这些界限与特征类型相关联。然后,我们表征了这些方法所采用的近似类型,并分析了这些近似如何应对目标目标函数的良好特性。此外,我们开发了一种分布设置,旨在说明这些方法的各种缺陷,并提供一些错误的特征选择示例。根据我们的工作,我们明确确定应避免的方法以及目前性能最佳的方法。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号