首页> 外文会议>International Conference on Computational Intelligence Modeling Techniques and Applications >An Improved Minimum Redundancy Maximum Relevance Approach for Feature Selection in Gene Expression Data
【24h】

An Improved Minimum Redundancy Maximum Relevance Approach for Feature Selection in Gene Expression Data

机译:基因表达数据中特征选择的提高最小冗余最大关联方法

获取原文

摘要

In this article, an improved feature selection technique has been proposed. Mutual Information is taken as the basic criterion to find the feature relevance and redundancy. The mutual information between a feature and class labels defines the relevance of that feature. Again, the mutual information among different features defines the correlation i.e., the redundancy among those features. Now our objective is to find such a feature set for which the mutual information among the features and the class labels are maximized and the mutual information among the features are minimized. Therefore, the goal of the proposed method is to find the most relevant and least redundant feature set. The number of output features is provided by the user. First the most relevant feature is added to the empty final feature set. Then in each iteration a non-dominated feature set with respect to relevance and redundancy is generated and from this set of features, the most relevant and non-redundant feature is included in the final feature set. Thereafter, in an incremental way a feature is added in every iteration and this step is repeated while the size of the final feature set is equal to the user given number of features. The features contained by the final feature set have maximum relevance and least correlation. The proposed method is applied on microarray gene expression data to find the most relevant and non-redundant genes and the performance of the proposed method is compared with that of the popular mRMR (MIQ) and mRMR (MID) schemes on several real-life data sets.
机译:在本文中,已经提出了一种改进的特征选择技术。相互信息被视为找到特征相关性和冗余的基本标准。功能和类标签之间的互信息定义了该功能的相关性。同样,不同特征之间的互信息定义了这些特征中的冗余。现在我们的目的是找到这样的特征集,其中特征和类标签之间的互信息最大化,并且特征之间的互信息被最小化。因此,所提出的方法的目标是找到最相关和最冗余的功能集。用户提供的输出功能数。首先将最相关的功能添加到空最终功能集中。然后在每次迭代中,生成关于相关性和冗余的非主导特征集,并且从该组特征中,最相关和非冗余功能包括在最终功能集中。此后,以增量方式在每次迭代中添加特征,并且在最终特征集的大小等于用户的特征数量时重复该步骤。最终功能集中包含的特征具有最大相关性和最不相关性。所提出的方法应用于微阵列基因表达数据,以找到最相关和最冗余的基因,并将所提出的方法的性能与若干现实数据的流行MRMR(MIQ)和MRMR(MID)方案进行比较套。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号