首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Sample Subset Optimization for Classifying Imbalanced Biological Data
【24h】

Sample Subset Optimization for Classifying Imbalanced Biological Data

机译:分类不平衡生物数据的示例子集优化

获取原文

摘要

Data in many biological problems are often compounded by imbalanced class distribution. That is, the positive examples may largely outnumbered by the negative examples. Many classification algorithms such as support vector machine (SVM) are sensitive to data with imbalanced class distribution, and result in a suboptimal classification. It is desirable to compensate the imbalance effect in model training for more accurate classification. In this study, we propose a sample subset optimization technique for classifying biological data with moderate and extremely high imbalanced class distributions. By using this optimization technique with an ensemble of SVMs, we build multiple roughly balanced SVM base classifiers, each trained on an optimized sample subset. The experimental results demonstrate that the ensemble of SVMs created by our sample subset optimization technique can achieve higher area under the ROC curve (AUC) value than popular sampling approaches such as random over-/under-sampling; SMOTE sampling, and those in widely used ensemble approaches such as bagging and boosting.
机译:许多生物问题中的数据通常通过不平衡的类分布复合。也就是说,阳性实例可以在很大程度上超过负例子。许多分类算法,如支持向量机(SVM)对具有不平衡类分布的数据敏感,并导致次优分类。希望补偿模型训练中的不平衡效果以获得更准确的分类。在这项研究中,我们提出了一种采样子集优化技术,用于分类中等和极高的不平衡类分布的生物数据。通过使用具有SVMS的集合的该优化技术,我们构建多个大致平衡的SVM基本分类器,每个基本分类器在优化的样本子集上训练。实验结果表明,由我们的样本子集优化技术创建的SVMS的集合可以在ROC曲线(AUC)值下的更高面积,而是比流行的采样方法,如随机过度/欠抽样;粉碎采样,以及广泛使用的集合方法,如装袋和提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号