...
首页> 外文期刊>BMC Bioinformatics >Improving the accuracy of protein secondary structure prediction using structural alignment
【24h】

Improving the accuracy of protein secondary structure prediction using structural alignment

机译:使用结构对准提高蛋白质二级结构预测的准确性

获取原文

摘要

Background The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high. Results We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%. Conclusion By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus . For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.
机译:背景技术在过去30年中,蛋白质二级结构预测的准确性稳步提高。现在许多二级结构预测方法经常达到约75%的精度(Q3)。我们认为通过包括结构(而不​​是序列)数据库比较作为预测过程的一部分,可以进一步改善这种准确性。实际上,鉴于蛋白质数据库(> 35,000个序列)的大尺寸,具有结构同源物的新鉴定序列的概率实际上非常高。结果我们开发了一种执行基于结构的序列对齐的方法,作为次级结构预测过程的一部分。通过将已知同源物(序列ID> 25%)的结构映射到查询蛋白的序列上,可以预测该查询蛋白的二级结构的至少一部分。通过将这种结构对准方法与常规(基于序列的)的二级结构方法相结合,然后将其与“陪审员的陪审员”系统组合以产生共识结果,可以获得非常高的预测精度。使用来自EVA的序列独特测试组1644蛋白,这种新方法平均Q3得分为81.3%。广泛的测试表明,比目前可用的任何其他方法更好地大约4-5%。使用非序列独特的测试集(典型的蛋白质组织注释或结构基因组学的典型评估表明,这种新方法可以实现Q3得分接近88%。结论通过使用序列和结构数据库,通过利用机器学习中的最新技术,可以常规预测蛋白质二级结构,精度远高于80%。一个程序和Web服务器,名为proteus,它执行这些辅助结构预测的程序可访问http://wishart.biology.upberta.ca/proteus。对于高吞吐量或批量序列分析,可以在本地下载和运行Proteus程序,数据库(和服务器)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号