首页> 外文期刊>Applied Soft Computing >A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm
【24h】

A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm

机译:FICA和IWSSr算法在高维数据集中特征子集选择的混合算法

获取原文
获取原文并翻译 | 示例
           

摘要

Feature subset selection is a substantial problem in the field of data classification tasks. The purpose of feature subset selection is a mechanism to find efficient subset retrieved from original datasets to increase both efficiency and accuracy rate and reduce the costs of data classification. Working on high-dimensional datasets with a very large number of predictive attributes while the number of instances is presented in a low volume needs to be employed techniques to select an optimal feature subset. In this paper, a hybrid method is proposed for efficient subset selection in high-dimensional datasets. The proposed algorithm runs filter-wrapper algorithms in two phases. The symmetrical uncertainty (SU) criterion is exploited to weight features in filter phase for discriminating the classes. In wrapper phase, both FICA (fuzzy imperialist competitive algorithm) and IWSSr (Incremental Wrapper Subset Selection with replacement) in weighted feature space are executed to find relevant attributes. The new scheme is successfully applied on 10 standard high-dimensional datasets, especially within the field of biosciences and medicine, where the number of features compared to the number of samples is large, inducing a severe curse of dimensionality problem. The comparison between the results of our method and other algorithms confirms that our method has the most accuracy rate and it is also able to achieve to the efficient compact subset. (C) 2015 Elsevier B.V. All rights reserved.
机译:在数据分类任务领域中,特征子集选择是一个重大问题。特征子集选择的目的是一种寻找从原始数据集中检索的有效子集的机制,以提高效率和准确率并降低数据分类的成本。在实例数量很少的情况下处理具有大量预测属性的高维数据集时,需要采用技术来选择最佳特征子集。本文提出了一种在高维数据集中有效选择子集的混合方法。所提出的算法分两个阶段运行滤波器包装器算法。利用对称不确定性(SU)准则对滤波阶段的特征进行加权,以区分类别。在包装阶段,将执行加权特征空间中的FICA(模糊帝国竞争算法)和IWSSr(带有替换的增量包装子集选择)以查找相关属性。该新方案已成功应用于10个标准的高维数据集,特别是在生物科学和医学领域,与样本数量相比,特征数量大,从而引发了严重的维数问题。我们的方法与其他算法的结果之间的比较证实,我们的方法具有最高的准确率,并且还能够实现有效的紧凑子集。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号