...
首页> 外文期刊>Bioinformatics >Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
【24h】

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

机译:使用整体特征选择方法进行可靠的生物标志物鉴定以用于癌症诊断

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method.Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the- art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of similar to 15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.
机译:动机:生物标志物的发现是计算生物学生物医学应用中的重要主题,包括从高维数据中选择基因和SNP等应用。令人惊讶地,这种选择过程的关于采样变化或鲁棒性的稳定性直到最近才受到关注。但是,生物标志物的稳健性是一个重要的问题,因为它可能会极大地影响后续的生物学验证。此外,更健壮的标记集可以增强专家对选择方法结果的信心。结果:我们的第一个贡献是为生物标记选择算法的鲁棒性分析提供了一个通用框架。其次,我们对最近引入的集成特征选择概念进行了大规模分析,其中将多个特征选择组合在一起以提高最终一组选定特征的鲁棒性。我们专注于支持向量机(SVM)估计中嵌入的选择方法。 SVM是功能强大的分类模型,在生物数据的多项诊断和预后任务中显示出了最先进的性能。他们的功能选择扩展也为基因选择任务提供了良好的结果。我们表明,通过使用集成特征选择技术,可以显着提高SVM对生物标记物发现的鲁棒性,同时提高分类性能。在四个微阵列数据集上评估了所提出的方法,这些数据显示所选生物标记物的鲁棒性提高了近30%,而分类性能提高了近15%。对于较小的签名大小(数十个基因),使用集成方法的稳定性提高尤其明显,这与从基因签名设计诊断或预后模型最相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号