首页> 外文期刊>Journal of grid computing >Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins
【24h】

Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

机译:超越同源转移:深度学习蛋白质自动注释

获取原文
获取原文并翻译 | 示例
           

摘要

Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq - a deep learning architecture - that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.
机译:准确注释蛋白质功能对于对分子生物学的深刻理解是重要的。由于可用支持信息的稀疏性,大量蛋白质保持不协调。对于大量的无特征化蛋白,可获得的唯一信息是它们的氨基酸序列。这激励了能够精确地注释非特征蛋白的基于序列的计算技术的需要。在本文中,我们提出了DeepSeq - 一种深度学习架构 - 仅利用蛋白质序列信息来预测其相关功能。预测过程不需要手工制作功能;而是,架构自动从输入序列数据中提取表示。与DeepSeq的实验结果表明,与其他基于序列的方法相比,在预测准确性方面表明了显着的改进。我们的深度学习模式实现了86.72%的整体验证准确性,F1得分为71.13%。我们通过利用序列信息来实现通过Deepseq的蛋白质函数预测问题的提高结果。此外,使用自动学习的功能和对DeepSeq的任何改变,我们成功解决了一个不同的问题,即蛋白质功能本地化,没有人为干预。最后,我们讨论了相同的架构如何用于解决更复杂的问题,例如2D和3D结构的预测以及蛋白质 - 蛋白质相互作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号