...
首页> 外文期刊>BMC Bioinformatics >Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
【24h】

Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

机译:快速的在线和基于索引的算法,用于RNA序列结构模式的近似搜索

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator webcite .
机译:背景技术众所周知,如果将序列和结构信息都纳入搜索中,则对同源RNA的搜索将更为有效。然而,由于潜在的序列结构比对问题的高计算成本,目前用于搜索RNA序列结构模式的工具不能完全处理在这两个水平上发生的突变,或者根本不够快,无法搜索大型序列数据库。结果我们提出了新的基于快速索引的在线算​​法,用于RNA序列结构模式的近似匹配,支持对单个碱基和碱基对的全套编辑操作。我们的方法有效地计算了结构RNA模式和目标序列子串的半全局比对,其成本满足用户定义的序列结构编辑距离阈值。为此,我们引入了一种新的计算方案,以针对所有子字符串最佳地重用所需的动态编程矩阵的条目,并将其与避免不匹配子字符串的对齐计算的技术相结合。我们基于索引的新方法利用了从目标数据库预处理的后缀数组,并实现了运行时间,该运行时间在搜索序列的大小上是次线性的。为了支持对具有多个有序序列结构模式的复杂二级结构折叠的RNA分子的描述,我们使用快速算法对近似序列结构模式匹配进行局部或全局链接。链接步骤从中间结果集中删除了虚假匹配,特别是那些特异性很少的模式。在Rfam数据库的基准实验中,我们改进的在线算法比以前的最佳方法要快45倍。我们最好的新的基于索引的算法实现了560倍的提速。以前的方法。这与所提出的基于索引的算法的预期亚线性运行时间一起,首次允许大型序列数据库中RNA序列结构模式的首次近似匹配。除了算法方面的贡献外,我们还为RaligNAtor提供了一个功能强大且文档齐全的开源软件包,可实现本手稿中介绍的算法。 RaligNAtor软件可从http://www.zbh.uni-hamburg.de/ralignator webcite获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号