首页> 外文期刊>Scientific reports. >CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a?random forest and a deep neural network
【24h】

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a?random forest and a deep neural network

机译:CPEM:基于使用A型随机森林和深神经网络的整体改变的体细胞改变准确的癌症类型分类

获取原文
获取外文期刊封面目录资料

摘要

With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the?six most common types and achieved up to 94% accuracy.
机译:随着近期DNA测序技术的进展,快速获取大规模基因组数据已成为常见。特别是对于癌症研究,越来越需要基于从测序分析中检测的体细胞改变进行癌症类型的分类。然而,越来越大的数据尺寸和复杂性使分类任务非常具有挑战性。在这项研究中,我们评估各种输入特征的贡献,例如突变谱,突变率,突变谱和签名,以及可以从基因组数据中得出的体细胞拷贝数改变,并进一步利用它们以获得准确的癌症类型分类。我们介绍了一种新颖的机器学习分类器集合,称为CPEM(使用集合模型的癌症预测器),其在7,002个样本上测试,该样本代表从癌症基因组Atlas(TCGA)数据库收集的31种不同的癌症类型。我们首先系统地检查了输入功能的影响。已知与特定癌症相关的特征在我们的初始预测模型中具有相对高的重要性。我们进一步调查了各种机器学习分类器和特征选择方法,以导出基于集合的癌症类型预测模型,在嵌套的10倍交叉验证中实现高达84%的分类精度。最后,我们将目标癌症缩小到?六种最常见的类型,并达到高达94%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号