Stable variable selection of class-imbalanced data with precision-recall criterion

Guang-Hui Fu; Feng Xu; Bing-Yang Zhang; Lun-Zhao Yi

首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >Stable variable selection of class-imbalanced data with precision-recall criterion

【24h】

Stable variable selection of class-imbalanced data with precision-recall criterion

机译：具有精密召回标准的稳定变量选择类 - 不平衡数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract

Screening important variables for class-imbalanced data is still a challenging task. In this study, we propose an algorithm for stably selecting key variables on class-imbalanced data based on the precision-recall curve (PRC), where the PRC is utilized as the assessment criterion in the model building stage, and sparse regularized logistic regression combined with subsampling (SRLRS) is designed to perform stable variable selection. Considering the characteristic of class-imbalanced data, we also proposed classification-based partition for cross validation, as well as leaving half of majority observations out and leaving one minority observation out (LHO-LOO) for subsampling. Simulation results and real data showed that our algorithm is highly suitable for handling class-imbalanced data, and that the PRC can be an alternative evaluation criterion for model selection when handling class-imbalanced data.

Highlights

?Precision-recall curve (PRC) as a criterion for variable selection of class-imbalanced data.

?A novel algorithm (SRLRS) is proposed for dealing with class-imbalanced data.

?A novel subsampling (LHO-LOO) strategy for class-imbalanced data is designed for stable variable selection.

?Sparse regularized methods are successfully used for class-imbalanced data.

]]>

机译：<！[CDATA [

抽象

筛选类 - 不平衡数据的重要变量仍然是一个具有挑战性的任务。在这项研究中，我们提出了一种算法，用于基于精密召回曲线（PRC）稳定地选择类别 - 不平衡数据上的关键变量，其中PRC用作模型构建阶段中的评估标准，以及稀疏的正则化物流回归组合使用子采样（SRLR）旨在执行稳定的变量选择。考虑到类别不平衡数据的特征，我们还提出了基于分类的交叉验证的分区，以及留出一半的多数观察结果，并将一个少数群体观察（LHO-LOO）留出（LHO-LOO）进行分支采样。仿真结果和实际数据表明，我们的算法非常适合处理类别不平衡数据，并且PRC可以是在处理类 - 不平衡数据时模型选择的替代评估标准。

亮点

？提出了一种新颖的算法（SRLRS），用于处理类别 - 不平衡数据。

？一种新颖的限位（LHO-LOO）策略Class-MataPalded数据被设计用于稳定的变量选择。

？ < CE：PARA ID =“P0025”View =“全部”>稀疏正常化方法已成功用于类别的数据。 ]]>

著录项

来源
《Chemometrics and Intelligent Laboratory Systems》 |2017年第2017期|共10页
作者
Guang-Hui Fu; Feng Xu; Bing-Yang Zhang; Lun-Zhao Yi;
展开▼
作者单位

School of Science Kunming University of Science and Technology;

School of Science Kunming University of Science and Technology;

School of Science Kunming University of Science and Technology;

Yunnan Food Safety Research Institute Kunming University of Science and Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计量学;
关键词
Precision-recall curve; Class-imbalanced data; Sparse regularization logistic regression; Stable variable selection; Subsampling;

机译：精确召回曲线;类 - 不平衡数据;稀疏正则化逻辑回归;稳定的变量选择;分支;

相似文献

外文文献
中文文献
专利

1. 关于一类具有不可变量三阶方程组的整体渐近稳定李雅普诺夫函数的结构 [J] . 李清, 王华, 等数学季刊：英文版 . 1995,第002期
2. Stable variable selection of class-imbalanced data with precision-recall criterion [J] . Guang-Hui Fu, Feng Xu, Bing-Yang Zhang, Chemometrics and Intelligent Laboratory Systems . 2017,第期

机译：具有精密召回标准的稳定变量选择类 - 不平衡数据
3. A variable selection criterion for linear discriminant rule and its optimality in high dimensional and large sample data [J] . Masashi Hyodo, Tatsuya Kubokawa Journal of Multivariate Analysis: An International Journal . 2014,第Null期

机译：高维大样本数据中线性判别规则的变量选择准则及其最优性
4. Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics [J] . Fu Guang-Hui, Wu Yuan-Jiao, Zong Min-Jie, Chemometrics and Intelligent Laboratory Systems . 2020,第期

机译：通过最大限度地减少代谢组中的类别不平衡数据的重叠度来选择和分类
5. Comparison of Two Frameworks for Measuring the Stability of Gene-Selection Techniques on Noisy Class-Imbalanced Data [C] . Wald Randall, Khoshgoftaar Taghi M., Shanab And Ahmad Abu International Conference on Tools with Artificial Intelligence . 2013

机译：两种衡量嘈杂类不平衡数据基因选择技术稳定性的框架的比较
6. Permuted inclusion criterion: A variable selection technique. [D] . Lysen, Shaun. 2009

机译：置换包含标准：一种变量选择技术。
7. Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data [O] . Guang-Hui Fu, Yuan-Jiao Wu, Min-Jie Zong, 2020

机译：高维类不平衡数据基于Hellinger距离的稳定稀疏特征选择
8. Influential data cases when the C-p criterion is used for variable selection in multiple linear regression [O] . Uys Daniel Wilhelm 2003

机译：当C-p准则用于多元线性回归中的变量选择时的有影响的数据情况
9. Criterion for Selection of Variables in a Regression Analysis. [R] . pulcher,larry j. 1978

机译：回归分析中变量选择的判据。

Stable variable selection of class-imbalanced data with precision-recall criterion

摘要

著录项

相似文献

相关主题

期刊订阅