An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

Giulio Pavesi; Giancarlo Mauri; Graziano Pesole

首页> 外文期刊>Journal of Computer Science & Technology >An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

【24h】

An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

机译：在未比对的RNA序列中寻找保守二级结构基序的算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.

机译：几项实验和观察结果揭示了这样一个事实，即RNA分子中小的局部独特结构特征与其生物学功能相关，例如在基因表达的转录后调控中。因此，在已知具有相同生物学功能的一组RNA序列中发现相似的结构特征，可以提供有关该序列的哪些部分负责该功能本身的大量信息。不幸的是，即使限于二级结构，在RNA分子中寻找共同的结构元件也是一项艰巨的任务。主要困难在于以下事实：在几乎所有情况下，分子的结构都是未知的，必须以某种方式进行预测，并且几乎没有相似性或没有相似性的序列可以折叠成相似的结构。尽管它们在某些细节上有所不同，但迄今为止提出的方法通常基于序列的初步比对，并尝试预测比对序列的共同结构（局部或全局或某些选定区域）。当序列和结构的相似性非常高时，这些方法会产生良好的结果，但是当相似性仅限于小的和局部的元素（如单个茎环基序）时，这些方法的效果会较差。我们提供的算法无需比对序列，而是直接搜索可以折叠成相似结构的序列区域，其中相似度可由用户定义。通过对输出进行后处理，可以将与主题中的序列相似性有关的任何信息用作搜索约束或后验。通过词缀树实现对结构相似的区域的搜索，词缀树是一种新颖的文本索引结构，可显着加快对具有对称布局的模式（例如形成茎环结构的模式）的搜索。基于实验已知结构的测试表明，该算法能够识别非编码RNA二级结构中的功能基序，例如铁蛋白mRNA非翻译区中的铁反应元件（IRE）和IV域茎环结构在SRP RNA中。

著录项

来源
《Journal of Computer Science & Technology》 |2004年第1期|p.2-12|共11页
作者
Giulio Pavesi; Giancarlo Mauri; Graziano Pesole;
展开▼
作者单位

Department of Computer Science, Systems and Communication, University of Milan-Bicocca, Via Bicocca degli Arcimboldi 8, Milan, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
pattern discovery; RNA secondary structure; affix trees;

机译：模式发现;RNA二级结构;附着树;
入库时间 2022-08-17 23:45:35

相似文献

外文文献
中文文献
专利

1. RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences [J] . Giulio Pavesi, Giancarlo Mauri, Marco Stefani, Nucleic Acids Research . 2004,第10期

机译：RNAProfile：一种在未比对的RNA序列中寻找保守二级结构基序的算法
2. RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences [J] . Giancarlo Mauri, Giulio Pavesi, Graziano Pesole, Nucleic acids research . 2004,第10期

机译：RNAProfile：一种用于在未比对的RNA序列中寻找保守二级结构基序的算法
3. An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences [J] . Giulio Pavesi, Giancarlo Mauri, Graziano Pesole 计算机科学技术学报（英文版） . 2004,第001期

机译：在未比对的RNA序列中寻找保守二级结构基序的算法
4. Predicting conserved hairpin motifs in unaligned RNA sequences [C] . Pavesi, G., Mauri, . 2003

机译：预测未比对的RNA序列中保守的发夹基序
5. A particle swarm optimization algorithm for finding DNA sequence motifs [D] . Lei, Chengwei 2008

机译：寻找DNA序列基序的粒子群优化算法
6. RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences [O] . Giulio Pavesi, Giancarlo Mauri, Marco Stefani, 2004

机译：RNAProfile：一种在未比对的RNA序列中寻找保守二级结构基序的算法
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。

An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

摘要

著录项

相似文献

相关主题

期刊订阅