Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

Guang-Hui Fu; Yuan-Jiao Wu; Min-Jie Zong; Jianxin Pan

首页> 外文期刊>BMC Bioinformatics >Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

【24h】

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

机译：基于Helling距离的稳定稀疏功能选择，用于高维类别 - 不平衡数据

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature selection in class-imbalance learning has gained increasing attention in recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition to reducing model complexity and discovering key biomarkers, feature selection is also an effective method of combating overlapping which may arise in such data and become a crucial aspect for determining classification performance. However, ordinary feature selection techniques for classification can not be simply used for addressing class-imbalanced data without any adjustment. Thus, more efficient feature selection technique must be developed for complicated class-imbalanced data, especially in the context of high-dimensionality. We proposed an algorithm called sssHD to achieve stable sparse feature selection applied it to complicated class-imbalanced data. sssHD is based on the Hellinger distance (HD) coupled with sparse regularization techniques. We stated that Hellinger distance is not only class-insensitive but also translation-invariant. Simulation result indicates that HD-based selection algorithm is effective in recognizing key features and control false discoveries for class-imbalance learning. Five gene expression datasets are also employed to test the performance of the sssHD algorithm, and a comparison with several existing selection procedures is performed. The result shows that sssHD is highly competitive in terms of five assessment metrics. In addition, sssHD presents limited differences between performing and not performing re-balance preprocessing. sssHD is a practical feature selection method for high-dimensional class-imbalanced data, which is simple and can be an alternative for performing feature selection in class-imbalanced data. sssHD can be easily extended by connecting it with different re-balance preprocessing, different sparse regularization structures as well as different classifiers. As such, the algorithm is extremely general and has a wide range of applicability.

机译：由于许多科学领域的高维类别 - 不平衡数据大幅增长，近年来，类别不平衡学习中的特征选择越来越受到了很大的关注。除了降低模型复杂性和发现关键生物标志物之外，特征选择还是对抗这些数据中可能出现的重叠的有效方法，并且成为用于确定分类性能的重要方面。然而，用于分类的普通特征选择技术不能简单地用于寻址类 - 不平衡数据而不进行任何调整。因此，必须为复杂的类别 - 不平衡数据开发更有效的特征选择技术，尤其是在高度的上下文中。我们提出了一种称为SSSHD的算法，以实现稳定的稀疏特征选择，将其应用于复杂的类别 - 不平衡数据。 SSSHD基于Hellinger距离（HD），与稀疏正则化技术相结合。我们说，Hellinger距离不仅是类不敏感，而且是翻译不变的。仿真结果表明基于HD的选择算法在识别关键特征和控制类别不平衡学习的错误发现方面是有效的。还采用五个基因表达数据集来测试SSSHD算法的性能，并执行与若干现有选择过程的比较。结果表明，SSSHD在五项评估指标方面具有竞争力。此外，SSSHD在执行且不执行重新平衡预处理之间存在有限的差异。 SSSHD是一个实用的特征选择方法，用于高维类别 - 不平衡数据，这很简单，并且可以是执行类别 - 不平衡数据中的特征选择的替代方案。通过将不同的重大平衡预处理，不同的稀疏正则化结构以及不同的分类器连接，可以轻松扩展SSSHD。因此，该算法非常一般，具有广泛的适用性。

著录项

来源
《BMC Bioinformatics》 |2020年第1期|共14页
作者
Guang-Hui Fu; Yuan-Jiao Wu; Min-Jie Zong; Jianxin Pan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Hellinger distanceClass-imbalance learningFeature selectionSparse regularization;

机译：Hellinger distanclass-manbalance学习选择selected solly正则化;
入库时间 2022-08-18 23:39:35

相似文献

外文文献
中文文献
专利

1. Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification [J] . Maldonado Sebastian, Lopez Julio Applied Soft Computing . 2018,第期

机译：处理高维类别 - 不平衡数据集：SVM分类的嵌入式功能选择
2. Online feature selection for high-dimensional class-imbalanced data [J] . Zhou Peng, Hu Xuegang, Li Peipei, Knowledge-Based Systems . 2017,第Nova15期

机译：高维类不平衡数据的在线特征选择
3. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines [J] . Sebastián Maldonado, Richard Weber, Fazel Famili Information Sciences: An International Journal . 2014,第Null期

机译：使用支持向量机的高维类不平衡数据集特征选择
4. Supervised Feature Selection Method for High-Dimensional Data Classification in Photo-Thermal Infrared Imaging with Limited Training Data [C] . Nian Zhang, Keenan Leatham International Conference on Control, Decision and Information Technologies . 2018

机译：有限训练数据的光热红外成像中高维数据分类的有监督特征选择方法
5. Novel Metrics and Theoretical Properties of Nearest-Neighbor Distance-Based Feature Selection in High-Dimensional Bioinformatics Data [D] . Dawkins, Bryan A. 2020

机译：高维生物信息学数据中最近邻距离的特征选择的新特性和理论特性
6. Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data [O] . Guang-Hui Fu, Yuan-Jiao Wu, Min-Jie Zong, 2020

机译：高维类不平衡数据基于Hellinger距离的稳定稀疏特征选择
7. A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data [O] . Andrea Bommert, Jörg Rahnenführer, Michel Lang 2017

机译：用于查找预测和稀疏模型的多轨道方法，具有稳定的高维数据特征选择

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

摘要

著录项

相似文献

相关主题

期刊订阅