首页> 美国卫生研究院文献>PLoS Clinical Trials >A Unified Multitask Architecture for Predicting Local Protein Properties
【2h】

A Unified Multitask Architecture for Predicting Local Protein Properties

机译:预测局部蛋白质特性的统一多任务架构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.
机译:各种功能上重要的蛋白质特性,例如二级结构,跨膜拓扑和溶剂可及性,都可以编码为氨基酸标记。实际上,从一级氨基酸序列预测此类特性是计算生物学的核心计划之一。因此,已经开发出了用于预测这种性质的大量方法。但是,大多数此类方法集中于一次解决单个任务。受自然语言处理领域最近成功的工作启发,我们建议使用多任务学习来训练一个单一的联合模型,该模型利用了这些各种标记任务之间的依赖性。我们描述了一种深度神经网络结构,该结构给出了蛋白质序列,输出了许多预测的局部特性,包括二级结构,溶剂可及性,跨膜拓扑,信号肽和DNA结合残基。该网络以有监督的方式针对所有这些任务进行了联合培训,并以一种新型的半监督学习形式进行了增强,其中对模型进行了培训,以区分天然蛋白和合成蛋白序列的局部模式。网络的与任务无关的体系结构消除了对特定于任务的功能工程的需求。我们证明,对于我们考虑的所有任务,相对于单任务神经网络方法,我们的方法导致统计上的性能显着改善,并且所生成的模型达到了最新的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号