首页> 美国卫生研究院文献>other >Protein contact prediction by integrating deep multiple sequence alignments coevolution and machine learning
【2h】

Protein contact prediction by integrating deep multiple sequence alignments coevolution and machine learning

机译:通过整合深度多重序列比对协同进化和机器学习进行蛋白质接触预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this work, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.
机译:在这项工作中,我们报告了在CASP12实验中通过三种不同方法预测的残基-残基接触的评估,重点是研究多重序列比对,残基协同进化和机器学习对接触预测的影响。第一种方法(MULTICOM-NOVEL)仅使用传统功能(序列特征,二级结构和溶剂可及性)以及深度学习来预测接触并用作基线。第二种方法(MULTICOM-CONSTRUCT)使用我们的新比对算法来生成深度多序列比对,以得出基于协进化的特征,这些特征通过神经网络方法进行集成以预测接触。第三种方法(MULTICOM-CLUSTER)是前两种方法的预测的共识组合。我们评估了94个CASP12域的方法。在38个自由建模域的子集上,我们的方法对顶级L / 5远程联系预测的平均精度高达41.7%。三种方法的比较表明,多个序列比对,基于协同进化的特征以及基于协同进化的特征和传统特征的机器学习集成的质量和有效深度,驱动着预测的蛋白质接触的质量。在完整的CASP12数据集上,仅基于协进化的特征可以将平均精度从28.4%提高到41.6%,并且当顶级L / 5预测远距离时,所有特征的机器学习集成将精度进一步提高到56.3%联系人进行评估。接触预测的精度与比对中有效序列数的对数之间的相关性是0.66。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号