首页> 外文会议>International conference on intelligent computer mathematics >Coordinating Discernibility and Independence Scores of Variables in a 2D Space for Efficient and Accurate Feature Selection
【24h】

Coordinating Discernibility and Independence Scores of Variables in a 2D Space for Efficient and Accurate Feature Selection

机译:协调二维空间中变量的可分辨性和独立性分数,以实现高效,准确的特征选择

获取原文

摘要

Feature selection is to remove redundant and irrelevant features from original ones of exemplars, so that a sparse and representative feature subset can be detected for building a more efficient and accurate classifier. This paper presents a novel definition for the discernibility and independence scores of a feature, and then constructs a two dimensional (2D) space with the feature's independence as y-axis and discernibility as x-axis to rank features' importance. This new method is named FSDI (Feature Selection based on Discernibility and Independence of a feature). The discernibility score of a feature is to measure the distinguishability of the feature to detect instances from different classes. The independence score is to measure the redundancy of a feature. All features are plotted in the 2D space according to their discernibility and independence coordinates. The area of the rectangular corresponding to a feature's discernibility and independence in the 2D space is used as a criterion to rank the importance of the features. Top-k features with much higher importance than the rest ones are selected to form the sparse and representative feature subset for building an efficient and accurate classifier. Experimental results on 5 classical gene expression datasets demonstrate that our proposed FSDI algorithm can select the gene subset efficiently and has the best performance in classification. Our method provides a good solution to the bottleneck issues related to the high time complexity of the existing gene subset selection algorithms.
机译:特征选择是从原始示例中去除多余和不相关的特征,以便可以检测到稀疏和代表性的特征子集,以构建更有效和准确的分类器。本文为特征的可分辨性和独立性评分提出了一个新颖的定义,然后构造一个二维(2D)空间,其特征的独立性为y轴,可分辨性为x轴以对特征的重要性进行排名。这种新方法称为FSDI(基于特征的可分辨性和独立性的特征选择)。特征的可区分性分数是衡量特征的可区分性,以检测来自不同类别的实例。独立性分数是衡量功能的冗余度。所有特征均根据其可分辨性和独立性坐标绘制在2D空间中。在2D空间中与特征的可分辨性和独立性相对应的矩形区域被用作对特征的重要性进行排序的标准。选择重要性比其余特征高得多的前k个特征,以形成稀疏和代表性的特征子集,以构建有效且准确的分类器。在5个经典基因表达数据集上的实验结果表明,我们提出的FSDI算法可以有效地选择基因子集,并且在分类中具有最佳性能。我们的方法为解决与现有基因子集选择算法的高时间复杂性有关的瓶颈问题提供了很好的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号