首页> 外文会议>International Conference on Machine Learning and Applications >Contrasting Undersampled Boosting with Internal and External Feature Selection for Patient Response Datasets
【24h】

Contrasting Undersampled Boosting with Internal and External Feature Selection for Patient Response Datasets

机译:与内部和外部特征选择形成对比的欠采样增强患者反应数据集

获取原文

摘要

Class imbalance (where one class has many more instances than the other class(es)) and high dimensionality (large number of features per instance) are two prevalent problems that are frequently present in patient response datasets. In addition to these problems, these datasets are notoriously difficult to build effective models from. This paper introduces a new hybrid boosting algorithm named SelectRUSBoost which combines data sampling and feature selection with every iteration of boosting. We test SelectRUSBoost along with RUSBoost combined with external feature selection on a set of five patient response datasets. In addition to the datasets we also utilize two classifiers, three filter-based feature selection techniques, and four feature subset sizes. Our results show that SelectRUSBoost will, with few exceptions, outperform RUSBoost combined with external feature selection. Also, the feature selection technique information gain outperformed the other techniques for all combinations of boosting approach, classifier, and feature subset size, and in addition for this feature selection technique SelectRUSBoost always (without exception) outperformed RUSBoost combined with external selection. Statistical analysis confirmed that SelectRUSBoost gives better performance than RUSBoost combined with external selection. This is the first work which utilizes SelectRUSBoost in a bioinformatics study.
机译:类不平衡(一个类比另一个类具有更多实例)和高维度(每个实例具有大量特征)是两个常见的问题,经常出现在患者反应数据集中。除了这些问题之外,众所周知很难从这些数据集构建有效的模型。本文介绍了一种新的名为BoostRUSBoost的混合增强算法,该算法将数据采样和特征选择与增强的每次迭代结合在一起。我们在五个患者反应数据集上测试了SelectRUSBoost和RUSBoost,并结合了外部功能选择。除数据集外,我们还利用了两个分类器,三种基于过滤器的特征选择技术和四种特征子集大小。我们的结果表明,结合少数外部功能,SelectRUSBoost的性能将优于RUSBoost。同样,在增强方法,分类器和特征子集大小的所有组合中,特征选择技术的信息获取均胜过其他技术,此外,对于这种特征选择技术,SelectRUSBoost始终(毫无例外)始终优于与外部选择相结合的RUSBoost。统计分析证实,与外部选择相结合,SelectRUSBoost的性能优于RUSBoost。这是在生物信息学研究中利用SelectRUSBoost的第一项工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号