Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

机译：使用后缀树拼写近似的重复主题或共同主题

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present in this paper two algorithms. The first one extracts repeated motifs from a sequence defined over an alphabet sigma. For instance, sigma may be equal to {A, C,G,T} and the sequence represents an encoding of a DNA macromolecule. The motifd searched correspond to words over th same alphabet which occur a minimum number q of times in the sequence with at most e mismatches each time. The second algorithm extracts common motifs from a set of N large than or equal to 2 sequences. In this case, the motifsmust occur, again with at most e mismatches, in 1 less than or equal to q less than or equal to N distinct sequences ofthe set. IN both cases, the words representing the motifs may never be present exactly i nthe sequences. We therefore speak of the motifs, repeatedin a sequence or common to a set of them, as being "external" objects and denote them by the expresion "valid models" if they verify the quorum constraint q. The approach we introduce here for finding all valid models corresponding to either repeated or common motifs starts by building a suffix tree of the sequence(s) and then, after some further presprocessing, uses this tree to simply "spell" the modles. Assuming an alphabet of fixed size, the total time needed is O(nN sup 2v(e,k)) using O(nN sup 2/w) space, where n is the length of the sequence(s), k is the length of the models sought or is the length of the longest possible valid models, w is the size of a word machine and V(e,k) is the number of words of length k at a Hamming distance at most e from another k-length word. V(e,k) may be majored by k sup e sigam sup e. This improves on an algorithm by Waterman [23]. It is also a better time bound than our previous approach [15] for the ocmmon motifs problem whenever N < k sigma, and a better space bound when N/w < k. It is a better time and space bound in absolute for the repeated motifs problem. The complexities obtained in htis second case are O(nV(e,k)) and O(n) respectively. Finally, we suggest how to extend these algorithms to deal with gaps.

机译：我们在本文中提出两种算法。第一个从字母sigma上定义的序列中提取重复的图案。例如，σ可以等于{A，C，G，T}，并且该序列表示DNA大分子的编码。所搜索的图案对应于同一字母上的单词，该单词在序列中出现最少次数q次，每次最多e个不匹配。第二种算法从N个大于或等于2个序列的集合中提取共同的主题。在这种情况下，基序必须再次出现，最多具有e个错配，且小于或等于q小于或等于集合的N个不同序列的数量为1。在这两种情况下，代表基序的单词可能永远不会准确地出现在序列中。因此，我们称这些主题为“外部”对象，这些主题按顺序重复或与它们的集合共有，如果它们验证了法定约束q，则用表达式“有效模型”表示它们。我们在这里介绍的用于找到与重复的或常见的模体相对应的所有有效模型的方法始于构建序列的后缀树，然后在进行一些进一步的预处理之后，使用该树简单地“拼写”模型。假设字母大小固定，则使用O（nN sup 2 / w）空间所需的总时间为O（nN sup 2v（e，k）），其中n是序列的长度，k是长度所寻找模型的长度或是可能的最长有效模型的长度，w是字机器的大小，V（e，k）是距另一k长度汉明距离最多e的长度k的字数字。 V（e，k）可由k sup e sigam sup e主修。这对沃特曼[23]提出的算法进行了改进。每当N

著录项

来源
《LATIN'98: Theoretical informatics》|1998年|p.374-390|共17页
会议地点 Campinas(BR);Campinas(BR)
作者
Marie-France Sagot;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees [J] . Inbal Paz, Limor Leibovich, Yael Mandel-Gutfreund, Nucleic acids research . 2013,第W1期

机译：DRIMust：一种用于使用后缀树发现等级不平衡主题的Web服务器
2. Searching maximal degenerate motifs guided by a compact suffix tree. [J] . Jiang H, Zhao Y, Chen W, Advances in Experimental Medicine and Biology . 2010,第Null期

机译：在紧凑的后缀树的引导下搜索最大的简并主题。
3. Suffix tree characterization of maximal motifs in biological sequences [J] . Maria Federico, Nadia Pisanti Theoretical computer science . 2009,第43期

机译：生物序列中最大基序的后缀树表征
4. Spelling Approximate Repeated or Common Motifs Using a Suffix Tree [C] . Marie-France Sagot Latin American symposium on theoretical informatics . 1998

机译：使用后缀树拼写近似重复或共同的主题
5. Finding DNA motifs: A probabilistic suffix tree approach [D] . Majumdar, Abhishek. 2017

机译：寻找DNA基序：概率后缀树法
6. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees [O] . Limor Leibovich, Inbal Paz, Zohar Yakhini, 2013

机译：DRIMust：用于使用后缀树发现等级不平衡主题的Web服务器
7. Spelling Approximate Repeated Or Common Motifs Using a Suffix Tree [O] . Marie-France Sagot, Universitapos 1998

机译：使用后缀树拼写近似重复或常见主题
8. CLM: A Language Binding for Common Lisp and OSF/Motif. User Guide and Reference Manual. Version 2.1. [R] . Baecker, A., Beilken, C., Beriage, T., 1992

机译：CLm：Common Lisp和OsF / motif的语言绑定。用户指南和参考手册。版本2.1。

Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

摘要

著录项

相似文献

相关主题

期刊订阅