首页> 外文期刊>BMC Bioinformatics >Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data
【24h】

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

机译:基于Helling距离的稳定稀疏功能选择,用于高维类别 - 不平衡数据

获取原文
       

摘要

Feature selection in class-imbalance learning has gained increasing attention in recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition to reducing model complexity and discovering key biomarkers, feature selection is also an effective method of combating overlapping which may arise in such data and become a crucial aspect for determining classification performance. However, ordinary feature selection techniques for classification can not be simply used for addressing class-imbalanced data without any adjustment. Thus, more efficient feature selection technique must be developed for complicated class-imbalanced data, especially in the context of high-dimensionality. We proposed an algorithm called sssHD to achieve stable sparse feature selection applied it to complicated class-imbalanced data. sssHD is based on the Hellinger distance (HD) coupled with sparse regularization techniques. We stated that Hellinger distance is not only class-insensitive but also translation-invariant. Simulation result indicates that HD-based selection algorithm is effective in recognizing key features and control false discoveries for class-imbalance learning. Five gene expression datasets are also employed to test the performance of the sssHD algorithm, and a comparison with several existing selection procedures is performed. The result shows that sssHD is highly competitive in terms of five assessment metrics. In addition, sssHD presents limited differences between performing and not performing re-balance preprocessing. sssHD is a practical feature selection method for high-dimensional class-imbalanced data, which is simple and can be an alternative for performing feature selection in class-imbalanced data. sssHD can be easily extended by connecting it with different re-balance preprocessing, different sparse regularization structures as well as different classifiers. As such, the algorithm is extremely general and has a wide range of applicability.
机译:由于许多科学领域的高维类别 - 不平衡数据大幅增长,近年来,类别不平衡学习中的特征选择越来越受到了很大的关注。除了降低模型复杂性和发现关键生物标志物之外,特征选择还是对抗这些数据中可能出现的重叠的有效方法,并且成为用于确定分类性能的重要方面。然而,用于分类的普通特征选择技术不能简单地用于寻址类 - 不平衡数据而不进行任何调整。因此,必须为复杂的类别 - 不平衡数据开发更有效的特征选择技术,尤其是在高度的上下文中。我们提出了一种称为SSSHD的算法,以实现稳定的稀疏特征选择,将其应用于复杂的类别 - 不平衡数据。 SSSHD基于Hellinger距离(HD),与稀疏正则化技术相结合。我们说,Hellinger距离不仅是类不敏感,而且是翻译不变的。仿真结果表明基于HD的选择算法在识别关键特征和控制类别不平衡学习的错误发现方面是有效的。还采用五个基因表达数据集来测试SSSHD算法的性能,并执行与若干现有选择过程的比较。结果表明,SSSHD在五项评估指标方面具有竞争力。此外,SSSHD在执行且不执行重新平衡预处理之间存在有限的差异。 SSSHD是一个实用的特征选择方法,用于高维类别 - 不平衡数据,这很简单,并且可以是执行类别 - 不平衡数据中的特征选择的替代方案。通过将不同的重大平衡预处理,不同的稀疏正则化结构以及不同的分类器连接,可以轻松扩展SSSHD。因此,该算法非常一般,具有广泛的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号