首页> 外文期刊>Computational and Structural Biotechnology Journal >Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
【24h】

Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features

机译:基于序列和功能特征的机器学习方法使用机器学习方法的果蝇中的基本基因预测

获取原文
           

摘要

Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster . A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D . melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P??0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC?=?0.97, PR-AUC?=?0.73, and F1?=?0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.
机译:如果它们的功能损失损失活力或导致深刻的健身损失,则基因被称为必要。在基因组规模上,这些基因可以通过实验使用RNAi或敲除屏幕确定,但这是非常资深的密集。对于必要基因预测的计算方法可以克服该缺点,特别是当考虑内在(例如来自蛋白质序列)以及外部特征时(例如来自转录型材)。在这项工作中,我们采用了机器学习来预测果蝇Melanogaster的基本基因。基于包括核苷酸和蛋白质序列,基因网络,蛋白质 - 蛋白质相互作用,进化保护和功能注释的各种不同方面产生了27,340个特征。使用交叉验证,我们获得了出色的预测性能。在d中实现的最佳模型。 Melanogaster的Roc-Auc为0.90,PR-AUC为0.30,F1分数为0.34。我们的方法显着优于一种基准方法,其中仅使用蛋白质序列的特征(P?<〜0.001)。调查哪些功能促成了这一成功,我们发现所有类别的功能,最突出的网络拓扑,功能和基于序列的特征。为了评估我们的方法,我们对人类的基本基因预测进行了相同的工作流程,并实现了Roc-Auc?=?0.97,Pr-Auc?=?0.73和F1?= 0.64。总之,本研究表明,使用我们熟练的特征组装,涵盖广泛的内在和外在基因和蛋白质特征,使智能系统能够预测生物体中基因的基本性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号