首页> 外文会议>International Conference on Machine Learning and Applications >Contrasting Undersampled Boosting with Internal and External Feature Selection for Patient Response Datasets
【24h】

Contrasting Undersampled Boosting with Internal and External Feature Selection for Patient Response Datasets

机译:对比upplated促进患者响应数据集的内部和外部特征选择

获取原文

摘要

Class imbalance (where one class has many more instances than the other class(es)) and high dimensionality (large number of features per instance) are two prevalent problems that are frequently present in patient response datasets. In addition to these problems, these datasets are notoriously difficult to build effective models from. This paper introduces a new hybrid boosting algorithm named SelectRUSBoost which combines data sampling and feature selection with every iteration of boosting. We test SelectRUSBoost along with RUSBoost combined with external feature selection on a set of five patient response datasets. In addition to the datasets we also utilize two classifiers, three filter-based feature selection techniques, and four feature subset sizes. Our results show that SelectRUSBoost will, with few exceptions, outperform RUSBoost combined with external feature selection. Also, the feature selection technique information gain outperformed the other techniques for all combinations of boosting approach, classifier, and feature subset size, and in addition for this feature selection technique SelectRUSBoost always (without exception) outperformed RUSBoost combined with external selection. Statistical analysis confirmed that SelectRUSBoost gives better performance than RUSBoost combined with external selection. This is the first work which utilizes SelectRUSBoost in a bioinformatics study.
机译:类不平衡(其中一个类具有比其他类的更多实例)和高维度(每个实例的大量特征)是患者响应数据集中经常存在的两个普遍存在的问题。除了这些问题,这些数据集难以建立有效的模型。本文介绍了一个名为Selectrusboost的新的混合升压算法,它将数据采样和特征选择与升压的每一次迭代相结合。我们将Selectrusboost与Rusboost联合联合外部特征选择在一组五个患者响应数据集上。除了数据集之外,我们还利用了两个分类器,三个基于滤波器的特征选择技术和四个特征子集大小。我们的结果表明,Selectrusboost将略有罕见的Rusboost与外部特征选择相结合。此外,特征选择技术信息增益优于升压方法,分类器和特征子集大小的所有组合的其他技术,并且此外,对于该特征选择技术,SelectrusBoost始终(无例外)优于外部选择。统计分析证实,Selectrusboost提供比Rusboost联合外部选择的更好的性能。这是第一个在生物信息学研究中使用Selectrusboost的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号