...
首页> 外文期刊>Journal of Biological Research >Iterative ensemble feature selection for multiclass classification of imbalanced microarray data
【24h】

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

机译:迭代集成特征选择用于不平衡微阵列数据的多类分类

获取原文

摘要

BackgroundMicroarray technology allows biologists to monitor expression levels of thousands of genes among various tumor tissues. Identifying relevant genes for sample classification of various tumor types is beneficial to clinical studies. One of the most widely used classification strategies for multiclass classification data is the One-Versus-All (OVA) schema that divides the original problem into multiple binary classification of one class against the rest. Nevertheless, multiclass microarray data tend to suffer from imbalanced class distribution between majority and minority classes, which inevitably deteriorates the performance of the OVA classification. ResultsIn this study, we propose a novel iterative ensemble feature selection (IEFS) framework for multiclass classification of imbalanced microarray data. In particular, filter feature selection and balanced sampling are performed iteratively and alternatively to boost the performance of each binary classification in the OVA schema. The proposed framework is tested and compared with other representative state-of-the-art filter feature selection methods using six benchmark multiclass microarray data sets. The experimental results show that IEFS framework provides superior or comparable performance to the other methods in terms of both classification accuracy and area under receiver operating characteristic curve. The more number of classes the data have, the better performance of IEFS framework achieves. ConclusionsBalanced sampling and feature selection together work well in improving the performance of multiclass classification of imbalanced microarray data. The IEFS framework is readily applicable to other biological data analysis tasks facing the same problem.
机译:背景技术微阵列技术使生物学家能够监测各种肿瘤组织中数千种基因的表达水平。鉴定用于各种肿瘤类型的样品分类的相关基因对临床研究是有益的。用于多类分类数据的最广泛使用的分类策略之一是“全对全”(OVA)模式,该模式将原始问题分为一个类别与其他类别的多个二进制分类。然而,多类微阵列数据倾向于遭受多数族和少数族之间的不平衡的类分布,这不可避免地使OVA分类的性能恶化。结果在这项研究中,我们提出了一种新颖的迭代集成特征选择(IEFS)框架,用于不平衡微阵列数据的多类分类。特别是,过滤器特征选择和平衡采样是迭代执行的,并且可以提高OVA模式中每个二进制分类的性能。使用六个基准多类微阵列数据集对提出的框架进行了测试,并与其他代表性的最新过滤器特征选择方法进行了比较。实验结果表明,IEFS框架无论在分类精度还是在接收器工作特性曲线下的面积上均提供了优于其他方法的性能。数据具有的类数越多,IEFS框架的性能越好。结论平衡采样和特征选择可以很好地改善不平衡微阵列数据的多类分类性能。 IEFS框架很容易适用于面临相同问题的其他生物数据分析任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号