首页> 外文期刊>Knowledge-Based Systems >Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches
【24h】

Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches

机译:高尺寸的合奏功能选择,低样本大小数据集:并行和串行组合方法

获取原文
获取原文并翻译 | 示例

摘要

Feature selection in high dimension, low sample size (HDLSS) data is always an important data pre-processing task. In the literature, the concept of ensemble learning has been applied to improve single feature selection methods, the so-called ensemble feature selection techniques. The most widely used approach is to combine multiple feature selection methods and their selection results via some sort of aggregation function in a parallel manner. Another ensemble strategy is based on the serial combination approach where the selection results of the first feature selection stage are used as input for the second stage of feature selection to produce the final output. The aim of this paper is to fully explore the performance of parallel and serial combination approaches for ensemble feature selection over HDLSS data. In particular, we strive to answer two research questions: whether parallel and serial based ensemble feature selection can outperform single feature selection and which combination approach is the better choice for ensemble feature selection. The experimental results based on comparing nine parallel and nine serial combinations, as well as three single baseline feature selection methods, including principal component analysis (PCA), genetic algorithm (GA), and C4.5 decision tree, show that ensemble feature selection performs better than single feature selection in terms of classification accuracy. However, there are no significant differences in performance between the single best baseline method (i.e. GA) and the top three parallel and serial combinations. On the other hand, the serial combination approach produces the largest feature reduction rate. (C) 2020 Elsevier B.V. All rights reserved.
机译:特征选择在高维,低样本大小(HDLS)数据始终是一个重要的数据预处理任务。在文献中,已应用集合学习的概念来改善单个特征选择方法,所谓的集合特征选择技术。最广泛使用的方法是通过以并行方式通过某种聚合函数来组合多个特征选择方法及其选择结果。另一个集合策略基于串行组合方法,其中第一特征选择阶段的选择结果用作特征选择的第二阶段的输入,以产生最终输出。本文的目的是充分探讨了对HDLS数据的合并功能选择的平行和串行组合方法的性能。特别是,我们努力回答两项研究问题:是否并行和串行的集合特征选择可以优于单个特征选择,哪种组合方法是合奏特征选择的更好选择。基于比较九平行和九个串行组合的实验结果,以及三个单个基线特征选择方法,包括主成分分析(PCA),遗传算法(GA)和C4.5决策树,显示了合奏特征选择执行在分类准确性方面优于单个特征选择。然而,单一最佳基线方法(即GA)和前三个平行和串行组合之间的性能没有显着差异。另一方面,串行组合方法产生最大的特征减少率。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号