...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >MUSE: Minimum Uncertainty and Sample Elimination Based Binary Feature Selection
【24h】

MUSE: Minimum Uncertainty and Sample Elimination Based Binary Feature Selection

机译:MUSE:基于最小不确定度和样本消除的二值特征选择

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30 percent higher sensitivity at the expense of 7 percent loss of specificity. For the Gisette dataset, the proposed method achieves 15 percent higher accuracy for Class 2, at the expense of 3 percent lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.
机译:本文提出了一种基于最小不确定度和特征样本消除的新型增量特征选择方法(称为MUSE)。特征选择是机器学习中的重要一步。在增量特征选择方法中,过去的方法试图增加类别相关性,同时最小化先前选择的特征的冗余。这种方法的一个示例是最小冗余最大相关性(mRMR)的特征选择方法。所提出的方法与先前的mRMR方法的不同之处在于如何减少当前特征与先前选择的特征的冗余。在所提出的方法中,特征样本被划分为预定数量的箱。该步骤称为特征量化。通过将仓的条件熵加起来,可以计算出每个特征的新颖不确定性得分,并选择具有最低不确定性得分的特征。对于每个仓,通过获取1类和2类概率的最小值来计算其杂质。与杂质低于阈值的仓对应的特征样本将被丢弃,并且不用于选择后续特征。使用两个数据集证明了MUSE特征选择方法的重要性:心律失常和手手指识别(Gisette),以及用于预测五只狗和两个人的癫痫发作的数据集。结果表明,所提方法在大多数情况下均优于现有的mRMR特征选择方法。对于心律不齐数据集,所提出的方法以7%的特异性损失为代价,将灵敏度提高了30%。对于Gisette数据集,该方法在2类中的准确度提高了15%,而在1类中的准确度降低了3%。就5只狗和2个人的癫痫发作预测而言,所提出的方法可以达到更高的所有受试者的曲线(AUC)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号