Approximate Matching in Weighted Sequences

机译：加权序列中的近似匹配

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a "statistical image" of this set, where the probability of every symbol's occurrence at every text location is given. We address the problem of approximately matching a pattern in such a weighted sequence. The pattern is a given string and we seek all locations in the set where the pattern occurs with a high enough probability. We define the notion of Hamming distance and edit distance in weighted sequences and give efficient algorithms for computing them. We compute two versions of the Hamming distance in time O(n (m log m)~(1/2)), where n is the length of the weighted text and m is the pattern length. The edit distance is computed in time O(nm) and O(nm~2), depending on the edit distance definition used. Unfortunately, due to space considerations, the edit distance details are left to the journal version. We also define the notion of weighted matching in infinite alphabets and show that exact weighted matching can be computed in time O(s log~2 s), where s is the number of text symbols having non-zero probability. The weighted Hamming distance over infinite alphabets can be computed in time min(O(kn s~(1/2) + s~(3/2) log~2 s), O(s~(4/3)m~(1/3) log s)).

机译：最近引入了加权序列作为处理一组不相同但具有许多局部相似性的序列的工具。加权序列是该集合的“统计图像”，其中给出了每个符号在每个文本位置出现的概率。我们解决了在这种加权序列中近似匹配模式的问题。模式是给定的字符串，我们以足够高的概率查找集合中模式发生的所有位置。我们定义了汉明距离的概念并按加权序列编辑距离，并给出了有效的算法来计算它们。我们在时间O（n（m log m）〜（1/2））中计算汉明距离的两个版本，其中n是加权文本的长度，m是图案长度。根据使用的编辑距离定义，以时间O（nm）和O（nm〜2）计算编辑距离。不幸的是，出于空间考虑，编辑距离的详细信息留给日志版本。我们还定义了无限字母中的加权匹配的概念，并表明可以在时间O（s log〜2 s）中计算出精确的加权匹配，其中s是具有非零概率的文本符号的数量。无限字母上的加权汉明距离可以在时间min（O（kn s〜（1/2）+ s〜（3/2）log〜2 s），O（s〜（4/3）m〜（ 1/3）log s））。

著录项

来源
《Annual Symposium on Combinatorial Pattern Matching(CPM 2006); 20060705-07; Barcelona(ES)》|2006年|P.365-376|共12页
会议地点 Barcelona(ES)
作者
Amihood Amir; Costas Iliopoulos; Oren Kapah; Ely Porat;
展开▼
作者单位

Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Applying Agrep to r-NSA to solve multiple sequences approximate matching [J] . Bing Ni, Man-Hon Wong, Chi-Fai David Lam, International journal of data mining and bioinformatics . 2014,第4期

机译：将Agrep应用于r-NSA以解决多个序列的近似匹配
2. A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences [J] . Haiman G., Preda C. Methodology and computing in applied probability . 2010,第4期

机译：逼近多个随机序列中常见单词匹配概率的新方法
3. APPROXIMATE MATCHING OF STRUCTURED MOTIFS IN DNA SEQUENCES [J] . Nadia El-Mabrouk, Mathieu Raffinot, Jean-Eudes Duchesne, Journal of Bioinformatics and Computational Biology . 2005,第2期

机译：DNA序列中结构化运动的近似匹配
4. Approximate Matching in Weighted Sequences [C] . Amihood Amir, Costas Iliopoulos, Oren Kapah, Annual Symposium on Combinatorial Pattern Matching . 2006

机译：加权序列中的近似匹配
5. Approximate sequence matching for fast visual retrieval. [D] . Yeh, Mei-Chen. 2009

机译：近似序列匹配，可快速进行视觉检索。
6. Computer program for the IBM personal computer which searches for approximate matches to short oligonucleotide sequences in long target DNA sequences. [O] . E W Myers, D W Mount 1986

机译：IBM个人计算机的计算机程序用于搜索长目标DNA序列中短寡核苷酸序列的近似匹配。
7. Optimizing Approximate Weighted Matching on Nvidia Kepler K40 [O] . Naim, Md., Manne, Fredrik, Halappanavar, Mahantesh, 2018

机译：优化Nvidia Kepler K40的近似加权匹配

Approximate Matching in Weighted Sequences

摘要

著录项

相似文献

相关主题

期刊订阅