首页> 外文期刊>BMC Structural Biology >Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
【24h】

Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce

机译:通过序列分割和MapReduce提高RNA二级结构预测的准确性和效率

获取原文
           

摘要

BackgroundRibonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.ResultsOn average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance.ConclusionsBy using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.
机译:背景技术核糖核酸(RNA)分子在许多生物过程(包括基因表达和调控)中都起着重要作用。它们的二级结构对于RNA功能至关重要,二级结构的预测已得到广泛研究。我们之前的研究表明,将长序列切成较短的片段,使用热力学方法独立预测片段的二级结构,并从预测的片段结构重建整个二级结构比使用整个RNA序列预测二级结构的准确性更高。分块,预测和重建过程可以使用不同的方法和参数,其中一些方法和参数比其他方法和参数产生更准确的预测。在本文中,我们使用七个流行的二级结构预测程序研究三种不同分块方法的预测准确性和效率,这些程序适用于两个具有已知二级结构的RNA数据集的数据集,包括伪打结和非伪打结序列,以及一个家族结构以前尚未预测的病毒基因组RNA的数量。我们基于Hadoop的模块化MapReduce框架使我们能够在并行且健壮的环境中研究问题。结果平均而言,对于我们的分块方法和超过50个非伪序列的七个预测程序,最大精度保留值大于一个。与通过使用整个序列预测的二级结构相比,使用分块预测的二级结构与实际结构更相似。除了使用居中分块方法的NUPACK程序外,我们观察到23个假结序列的相似结果。对Nodaviridae病毒家族的14个长RNA序列的性能分析概述了MapReduce框架中的粗粒化分块图和预测如何在短RNA序列中展现出较短的周转时间。但是,随着RNA序列长度的增加,细粒度的映射可以超过粗粒度的映射。结论通过使用我们的MapReduce框架以及对准确性保留结果的统计分析,我们观察了基于反转的分块方法可以使用整个序列胜过预测。我们基于块的方法还使我们能够预测非常长的RNA序列的二级结构,仅传统方法是不可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号