...
首页> 外文期刊>Bioinformatics >SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone
【24h】

SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone

机译:SMURFLite:将简化的马尔可夫随机场与模拟进化相结合,改善了进入暮光区的β结构蛋白的远程同源性检测

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif. Results: We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile-profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions.
机译:动机:迄今为止,识别进化相关蛋白序列的最成功方法之一是轮廓隐藏马尔可夫模型(HMM)。但是,这些模型没有捕获β折叠中氢键结合的残基的成对统计偏好。这些相关性已在HMM设置中通过训练阶段的模拟演变而部分捕获,并且可以由Markov随机字段(MRF)完全捕获。但是,当β链在复杂拓扑中交错时,MRF可能在计算上是禁止的。我们介绍了SMURFLite,该方法结合了简化的MRF和模拟进化,可以显着改善β结构的远程同源性检测。与以前的基于MRF的方法不同,SMURFLite在任何beta结构主题上在计算上都是可行的。结果:在严格的交叉验证实验中,我们在SCOP层次结构的主要beta类的所有螺旋桨和桶形折叠上测试了SMURFLite。我们显示,与HMMER(一种著名的HMM方法)相比,β结构基元识别的曲线下面积(AUC)平均提高了26%(中位数为16%),与之相比,平均提高了33%(中位数为19%)。尽管HHpred使用了大量额外的培训数据,但使用RAPTOR(一种著名的线程方法)甚至比HHPred(一种配置文件-轮廓HMM方法)的AUC平均提高了18%(中位数为10%)。我们通过运行207个β-结构SCOP超家族的SMURFLite文库,对马氏嗜热菌的整个基因组,证明SMURFLite能够扩展至整个基因组,并做出100倍以上的新预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号