首页> 外文期刊>Journal of Computer Science & Technology >An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences
【24h】

An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

机译:在未比对的RNA序列中寻找保守二级结构基序的算法

获取原文
获取原文并翻译 | 示例
       

摘要

Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.
机译:几项实验和观察结果揭示了这样一个事实,即RNA分子中小的局部独特结构特征与其生物学功能相关,例如在基因表达的转录后调控中。因此,在已知具有相同生物学功能的一组RNA序列中发现相似的结构特征,可以提供有关该序列的哪些部分负责该功能本身的大量信息。不幸的是,即使限于二级结构,在RNA分子中寻找共同的结构元件也是一项艰巨的任务。主要困难在于以下事实:在几乎所有情况下,分子的结构都是未知的,必须以某种方式进行预测,并且几乎没有相似性或没有相似性的序列可以折叠成相似的结构。尽管它们在某些细节上有所不同,但迄今为止提出的方法通常基于序列的初步比对,并尝试预测比对序列的共同结构(局部或全局或某些选定区域)。当序列和结构的相似性非常高时,这些方法会产生良好的结果,但是当相似性仅限于小的和局部的元素(如单个茎环基序)时,这些方法的效果会较差。我们提供的算法无需比对序列,而是直接搜索可以折叠成相似结构的序列区域,其中相似度可由用户定义。通过对输出进行后处理,可以将与主题中的序列相似性有关的任何信息用作搜索约束或后验。通过词缀树实现对结构相似的区域的搜索,词缀树是一种新颖的文本索引结构,可显着加快对具有对称布局的模式(例如形成茎环结构的模式)的搜索。基于实验已知结构的测试表明,该算法能够识别非编码RNA二级结构中的功能基序,例如铁蛋白mRNA非翻译区中的铁反应元件(IRE)和IV域茎环结构在SRP RNA中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号