首页> 外文期刊>Chem-Bio Informatics Journal >A Novel Over-Sampling Method and its Application to Cancer Classification from Gene Expression Data
【24h】

A Novel Over-Sampling Method and its Application to Cancer Classification from Gene Expression Data

机译:一种新的过采样方法及其在基于基因表达数据的癌症分类中的应用

获取原文
获取外文期刊封面目录资料

摘要

References(26) Cited-By(1) One of the most critical and frequent problems in biomedical data classification is imbalanced class distribution, where samples from the majority class significantly outnumber the minority class. SMOTE is a well-known general over-sampling method used to address this problem; however, in some cases it cannot improve or even reduces classification performance. To address these issues, we have developed a novel minority over-sampling method named safe-SMOTE. Experimental results from two gene expression datasets for cancer classification (i.e., colon-cancer and leukemia) and six imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than both the control method (i.e., no over-sampling) and SMOTE. For example, in the colon-cancer dataset, although the sensitivity and specificity achieved by SMOTE (81.36% and 88.63%) were lower than for the control method (81.59% and 89.50%), safe-SMOTE in contrast had these values increase (81.82% and 90.50%). Similarly, the G-mean value of the control (85.45%) decreased to 84.91% when SMOTE was employed, but increased to 86.04% when using safe-SMOTE. In the leukemia dataset, SMOTE was able to improve the sensitivity and G-mean values with respect to the control; however, safe-SMOTE achieved noticeable, even greater improvements for both of these criteria.
机译:参考文献(26)被引用的依据(1)生物医学数据分类中最关键和最常见的问题之一是类别分布不平衡,其中多数类别的样本明显多于少数类别。 SMOTE是一种众所周知的通用过采样方法,用于解决此问题。但是,在某些情况下,它无法提高甚至降低分类性能。为了解决这些问题,我们开发了一种新颖的少数群体过采样方法,称为safe-SMOTE。来自用于癌症分类的两个基因表达数据集(即结肠癌和白血病)和来自UCI机器学习存储库的六个不平衡的基准数据集的实验结果表明,与对照方法相比,我们的方法具有更高的灵敏度和G均值(即,没有过度采样)和SMOTE。例如,在结肠癌数据集中,尽管SMOTE实现的敏感性和特异性(81.36%和88.63%)低于对照方法(81.59%和89.50%),但相比之下,safe-SMOTE的这些值却增加了( 81.82%和90.50%)。同样,使用SMOTE时,对照组的G均值(85.45%)降至84.91%,而使用安全SMOTE时,其G平均值增至86.04%。在白血病数据集中,SMOTE能够提高相对于对照的敏感性和G均值;但是,对于这两个标准,safe-SMOTE取得了显着甚至更大的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号