首页> 外文期刊>Analytica chimica acta >An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
【24h】

An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data

机译:一种有效的算法,结合合成少数过采样技术,对不平衡的PubChem BioAssay数据进行分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

It is common that imbalanced datasets are often generated from high-throughput screening (HTS). For a given dataset without taking into account the imbalanced nature, most classification methods tend to produce high predictive accuracy for the majority class, but significantly poor performance for the minority class. In this work, an efficient algorithm, GLMBoost, coupled with Synthetic Minority Over-sampling TEchnique (SMOTE) is developed and utilized to overcome the problem for several imbalanced datasets from PubChem BioAssay. By applying the proposed combinatorial method, those data of rare samples (active compounds), for which usually poor results are generated, can be detected apparently with high balanced accuracy (Gmean). As a comparison with GLMBoost, Random Forest (RF) combined with SMOTE is also adopted to classify the same datasets. Our results show that the former (GLMBoost+SMOTE) not only exhibits higher performance as measured by the percentage of correct classification for the rare samples (Sensitivity) and Gmean, but also demonstrates greater computational efficiency than the latter (RF+SMOTE). Therefore, we hope that the proposed combinatorial algorithm based on GLMBoost and SMOTE could be extensively used to tackle the imbalanced classification problem. Published by Elsevier B.V.
机译:通常,不平衡的数据集通常是由高通量筛选(HTS)生成的。对于不考虑不平衡性质的给定数据集,大多数分类方法倾向于对多数类别产生较高的预测准确性,而对少数类别则表现出很差的表现。在这项工作中,开发了一种有效的算法GLMBoost以及合成的少数族裔过采样技术(SMOTE),并将其用于克服PubChem BioAssay中几个不平衡数据集的问题。通过应用所提出的组合方法,可以以很高的平衡精度(Gmean)明显地检测出通常产生不良结果的稀有样品(活性化合物)的那些数据。为了与GLMBoost进行比较,还采用了随机森林(RF)和SMOTE相结合的方法对相同的数据集进行分类。我们的结果表明,前者(GLMBoost + SMOTE)不仅表现出较高的性能(通过对稀有样品和敏感度进行正确分类的百分比(灵敏度)和Gmean来衡量),而且还显示出比后者(RF + SMOTE)更高的计算效率。因此,我们希望所提出的基于GLMBoost和SMOTE的组合算法能够被广泛地用于解决不平衡分类问题。由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号