...
首页> 外文期刊>BMC Bioinformatics >Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
【24h】

Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars

机译:使用随机上下文无关文法的多线程比较RNA二级结构预测

获取原文
           

摘要

Background The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales. Results We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction. Conclusions PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.
机译:背景技术由于先进的算法的计算复杂性和较低的准确性,大RNA结构的预测在生物信息学中仍然是一个特殊的挑战。 pfold模型将随机上下文无关文法与系统发育分析相结合,以实现预测的高精度,但是该算法的时间复杂性和下溢错误已使其无法用于长期比对。在这里,我们介绍PPfold,pfold的多线程版本,它能够在实际时间尺度上准确预测大型RNA比对的结构。结果我们在PPfold中同时分配了系统进化计算和内外算法,从而显着减少了多核计算机的运行时间。我们已经通过实现扩展指数数据类型解决了pfold的浮点下溢问题,使PPfold可以用于大规模RNA结构预测。我们还改善了用户界面和可移植性:除程序的独立可执行文件和Java源代码外,PPfold还作为CLC Workbenches的免费插件提供。我们使用BRaliBase I测试评估了PPfold的准确性,并通过在8核机器上在65分钟内预测24个完整HIV-1基因组比对的二级结构并在预测中确定了一些已知的结构元素,证明了其实际应用。结论PPfold是迄今为止第一个并行的比较RNA结构预测算法。基于pfold模型,PPfold能够快速,高质量地预测大型RNA二级结构,例如RNA病毒的基因组或长基因组转录物。在该算法的并行化中使用的技术可能普遍适用于其他生物信息学算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号