...
首页> 外文期刊>NeuroImage >Analysis of sampling techniques for imbalanced data: An n=648 ADNI study
【24h】

Analysis of sampling techniques for imbalanced data: An n=648 ADNI study

机译:分析用于非平衡数据的采样技术:N = 648 ADNI研究

获取原文
获取原文并翻译 | 示例

摘要

Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results.
机译:许多神经影像画应用处理了不平衡的成像数据。例如,在阿尔茨海默病神经影像倡议(ADNI)数据集中,符合涉及该研究的轻度认知障碍(MCI)病例是阿尔茨海默病(AD)患者的结构磁共振成像(MRI)模型和控制的六倍蛋白质组学模态的病例。从不平衡数据构造精确的分类是一个具有挑战性的任务。旨在最大限度地提高整体预测准确性的传统分类器倾向于将所有数据分类为多数类。在本文中,我们研究了类别不平衡问题的特征选择和数据采样的集合系统。我们通过检查不同速率和欠采样,过采样类型的疗效和过采样方法和欠采样方法的组合来系统地分析各种采样技术。我们彻底检查了六种广泛使用的特征选择算法以识别重要的生物标志物,从而降低数据的复杂性。使用包括随机森林和支持向量机的基于分类精度,接收器操作特征曲线(AUC),灵敏度和特异性措施的面积,评估集合技术的功效。我们广泛的实验结果表明,对于ADNI中的各种问题设置,(1)用基于K-METOIDS技术的扁平内采样的均衡训练集提供了不同数据采样技术的最佳整体性能,没有采样方法; (2)具有稳定性选择的稀疏逻辑回归在各种特征选择算法中实现了竞争性能。各种设置的综合实验表明,我们所提出的多个欠采样数据集的集合模型产生稳定和有前途的结果。

著录项

  • 来源
    《NeuroImage 》 |2014年第null期| 共22页
  • 作者单位

    School of Computing Informatics and Decision Systems Engineering Arizona State University Tempe;

    School of Computing Informatics and Decision Systems Engineering Arizona State University Tempe;

    School of Computing Informatics and Decision Systems Engineering Arizona State University Tempe;

    Imaging Genetics Center Laboratory of Neuro Imaging UCLA School of Medicine Los Angeles CA;

    School of Computing Informatics and Decision Systems Engineering Arizona State University Tempe;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 诊断学 ;
  • 关键词

    Alzheimer's disease; Classification; Feature selection; Imbalanced data; Oversampling; Undersampling;

    机译:阿尔茨海默病;分类;特征选择;数据不平衡;过采样;欠采样;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号