Stable variable selection of class-imbalanced data with precision-recall criterion
首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >Stable variable selection of class-imbalanced data with precision-recall criterion
【24h】

Stable variable selection of class-imbalanced data with precision-recall criterion

机译:具有精密召回标准的稳定变量选择类 - 不平衡数据

获取原文
获取原文并翻译 | 示例
           

摘要

AbstractScreening important variables for class-imbalanced data is still a challenging task. In this study, we propose an algorithm for stably selecting key variables on class-imbalanced data based on the precision-recall curve (PRC), where the PRC is utilized as the assessment criterion in the model building stage, and sparse regularized logistic regression combined with subsampling (SRLRS) is designed to perform stable variable selection. Considering the characteristic of class-imbalanced data, we also proposed classification-based partition for cross validation, as well as leaving half of majority observations out and leaving one minority observation out (LHO-LOO) for subsampling. Simulation results and real data showed that our algorithm is highly suitable for handling class-imbalanced data, and that the PRC can be an alternative evaluation criterion for model selection when handling class-imbalanced data.Highlights?Precision-recall curve (PRC) as a criterion for variable selection of class-imbalanced data.?A novel algorithm (SRLRS) is proposed for dealing with class-imbalanced data.?A novel subsampling (LHO-LOO) strategy for class-imbalanced data is designed for stable variable selection.?Sparse regularized methods are successfully used for class-imbalanced data.]]>
机译:<![CDATA [ 抽象 筛选类 - 不平衡数据的重要变量仍然是一个具有挑战性的任务。在这项研究中,我们提出了一种算法,用于基于精密召回曲线(PRC)稳定地选择类别 - 不平衡数据上的关键变量,其中PRC用作模型构建阶段中的评估标准,以及稀疏的正则化物流回归组合使用子采样(SRLR)旨在执行稳定的变量选择。考虑到类别不平衡数据的特征,我们还提出了基于分类的交叉验证的分区,以及留出一半的多数观察结果,并将一个少数群体观察(LHO-LOO)留出(LHO-LOO)进行分支采样。仿真结果和实际数据表明,我们的算法非常适合处理类别不平衡数据,并且PRC可以是在处理类 - 不平衡数据时模型选择的替代评估标准。 亮点 提出了一种新颖的算法(SRLRS),用于处理类别 - 不平衡数据。 一种新颖的限位(LHO-LOO)策略Class-MataPalded数据被设计用于稳定的变量选择。 < CE:PARA ID =“P0025”View =“全部”>稀疏正常化方法已成功用于类别的数据。 ]]>

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号