Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

Fernando Meyer; Stefan Kurtz; Michael Beckstette

首页> 外文期刊>BMC Bioinformatics >Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

【24h】

Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

机译：快速的在线和基于索引的算法，用于RNA序列结构模式的近似搜索

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator webcite .

机译：背景技术众所周知，如果将序列和结构信息都纳入搜索中，则对同源RNA的搜索将更为有效。然而，由于潜在的序列结构比对问题的高计算成本，目前用于搜索RNA序列结构模式的工具不能完全处理在这两个水平上发生的突变，或者根本不够快，无法搜索大型序列数据库。结果我们提出了新的基于快速索引的在线算法，用于RNA序列结构模式的近似匹配，支持对单个碱基和碱基对的全套编辑操作。我们的方法有效地计算了结构RNA模式和目标序列子串的半全局比对，其成本满足用户定义的序列结构编辑距离阈值。为此，我们引入了一种新的计算方案，以针对所有子字符串最佳地重用所需的动态编程矩阵的条目，并将其与避免不匹配子字符串的对齐计算的技术相结合。我们基于索引的新方法利用了从目标数据库预处理的后缀数组，并实现了运行时间，该运行时间在搜索序列的大小上是次线性的。为了支持对具有多个有序序列结构模式的复杂二级结构折叠的RNA分子的描述，我们使用快速算法对近似序列结构模式匹配进行局部或全局链接。链接步骤从中间结果集中删除了虚假匹配，特别是那些特异性很少的模式。在Rfam数据库的基准实验中，我们改进的在线算法比以前的最佳方法要快45倍。我们最好的新的基于索引的算法实现了560倍的提速。以前的方法。这与所提出的基于索引的算法的预期亚线性运行时间一起，首次允许大型序列数据库中RNA序列结构模式的首次近似匹配。除了算法方面的贡献外，我们还为RaligNAtor提供了一个功能强大且文档齐全的开源软件包，可实现本手稿中介绍的算法。 RaligNAtor软件可从http://www.zbh.uni-hamburg.de/ralignator webcite获得。

著录项

来源
《BMC Bioinformatics》 |2013年第1期|共页
作者
Fernando Meyer; Stefan Kurtz; Michael Beckstette;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Structator: fast index-based search for RNA sequence-structure patterns [J] . Fernando Meyer, Stefan Kurtz, Rolf Backofen, BMC Bioinformatics . 2011,第1期

机译：Structator：基于快速索引的RNA序列结构模式搜索
2. A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space [J] . Malek Esmaeili Mani, Ward Rabab Kreidieh, Fatourechi Mehrdad Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2012,第12期

机译：海明空间中的快速近似最近邻搜索算法
3. A new fast motion estimation algorithm using adaptive size diamond pattern search with early search termination [J] . Shaifali Madan Arora, Navin Rajpal, Ravindra Kumar Purwar International journal of computational vision and robotics . 2017,第6期

机译：一种新的快速运动估计算法，该算法使用自适应大小的菱形模式搜索和早期搜索终止
4. Space Efficient Modifications to Structator- A Fast Index-Based Search Tool for RNA Sequence-Structure Patterns [C] . Benjamin Albrecht, Volker Heun Experimental algorithms. . 2012

机译：结构优化器的空间高效修改-一种基于索引的快速搜索工具，用于RNA序列-结构模式
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns [O] . Fernando Meyer, Stefan Kurtz, Michael Beckstette 2013

机译：快速的在线和基于索引的算法用于RNA序列结构模式的近似搜索
7. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns [O] . 2013

机译：快速的在线和基于索引的算法，用于RNA序列结构模式的近似搜索

Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

摘要

著录项

相似文献

相关主题

期刊订阅