首页> 美国卫生研究院文献>Frontiers in Molecular Biosciences >Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
【2h】

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations

机译:随机森林分类器和深度卷积神经网络的集成用于癌症驱动程序突变的分类和生物分子建模

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.
机译:在现代生物医学研究中,开发用于预测癌症驱动基因和突变的功能和临床意义的机器学习解决方案至关重要,并且在最近十年中获得了巨大的发展。在这项工作中,我们集成了不同的机器学习方法,包括基于树的方法,随机森林和梯度增强树(GBT)分类器以及深度卷积神经网络(CNN),用于预测基因组数据集中的癌症驱动程序突变。最初通过使用标签编码,一种热编码和嵌入来预处理DNA信息,探索了CNN在使用原始核苷酸序列进行癌症驱动程序突变分类中的可行性。这些分类器针对其基于树的替代方案进行了基准测试,以便在相对范围内评估性能。然后,我们将CNN生成的基于DNA的评分与各种类别的保护,进化和功能特征整合到广义随机森林分类器中。这项研究的结果表明,CNN可以从基因组信息中学习高级特征,这些特征与通常用于癌症突变分类的基于集合的预测因子互补。通过将深度学习生成的分数与仅基于集成的两个主要功能组合在一起,我们可以实现各种机器学习分类器的卓越性能。我们的研究结果还表明,基于核苷酸的深度学习评分与源自蛋白质序列保守性评分的综合指标的协同作用,可以使癌症驱动程序突变具有可靠的分类,而有限的高度信息化的特征则是有限的。机器学习预测在分子模拟,蛋白质稳定性和蛋白激酶基因中癌症突变的基于网络的分析中得到利用,以获取有关驱动程序突变的分子标记的见解,并增强癌症特异性分类模型的可解释性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号