...
首页> 外文期刊>Journal of molecular modeling >Fast prediction of protein domain boundaries using conserved local patterns
【24h】

Fast prediction of protein domain boundaries using conserved local patterns

机译:使用保守的局部模式快速预测蛋白质结构域边界

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %.
机译:我们已经发现通过数据驱动的方法存在于内部域边界点(dbps)附近的某些保守的基序和二级结构模式,而对这些特征的类型和数量没有先验的约束,也不需要序列同源性。我们使用这些图案和图案来重新排名通过众所周知的按尺寸域猜测(DGS)算法获得的解决方案。我们总体上预测五个解决方案。我们的方法[使用保守模式的域边界预测(DPCP)]预测的整体(即前五名)的平均准确度已将DGS的前五种解决方案的平均准确度从71.74%提高到82.88%,在两种情况下,连续域蛋白,对于两个不连续域蛋白则为21.38%至80.56%。仅考虑顶部解决方案,对于链长度最多为300个残基的两个连续域蛋白,准确度的提高为0至72.74%,对于残基最多为400个残基的蛋白质,其准确度的提高为0至62.85%。在不连续域的情况下,DPCP的top_min解(预测蛋白质的所有dbps所需的最小解数)可将DGS预测的平均准确性从12.5提高到76.3%(链长最多为300个残基),具有最多400个残基的蛋白质的13.33%至70.84%。在我们的验证实验中,还发现DPCP的性能优于通过二级结构元素对齐(DomSSEA)进行域识别的性能,这是迄今为止报道的使用预测的二级结构有效预测域边界的最佳方法。对于连续结构域,具有最多300个残基和400个残基的蛋白质,DomSSEA最顶层溶液的平均准确度分别为61%和52%。不连续案例的相应准确度分别为28%和21%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号