Common Substrings in Random Strings

机译：随机字符串中的常见子字符串

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In computational biology, an important problem is to identify a word of length k present in each of a given set of sequences. Here, we investigate the problem of calculating the probability that such a word exists in a set of r random strings. Existing methods to approximate this probability are either inaccurate when r > 2 or are restricted to Bernoulli models. We introduce two new methods for computing this probability under Bernoulli and Markov models. We present generalizations of the methods to compute the probability of finding a word of length k shared among q of r sequences, and to allow mismatches. We show through simulations that our approximations are significantly more accurate than methods previously published.

机译：在计算生物学中，一个重要的问题是识别存在于给定序列集中的每个序列中的长度为k的单词。在这里，我们研究了计算一个单词存在于一组r个随机字符串中的概率的问题。当r> 2时，现有的近似此概率的方法要么不准确，要么仅限于伯努利模型。我们介绍了两种在Bernoulli和Markov模型下计算该概率的新方法。我们对这些方法进行了概括，以计算找到在r个序列的q个之间共享的，长度为k的单词的概率，并允许不匹配。通过仿真显示，我们的近似值比以前发布的方法准确得多。

著录项

来源
《Annual Symposium on Combinatorial Pattern Matching(CPM 2006); 20060705-07; Barcelona(ES)》|2006年|P.129-140|共12页
会议地点 Barcelona(ES)
作者
Eric Blais; Mathieu Blanchette;
展开▼
作者单位

McGill Centre for Bioinformatics and School of Computer Science McGill University, Montreal, Quebec, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Longest common substring for random subshifts of finite type [J] . Rousseau Jerome Annales de l'Institut Henri Poincare. Probabilites et Statistiques . 2021,第3期

机译：有限类型随机分机的最长共同的子字符串
2. Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings [J] . Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Theory of computing systems . 2020,第7期

机译：运行长度编码字符串最短独特的回文源问题的快速算法
3. Unique Reconstruction of Coded Strings From Multiset Substring Spectra [J] . Gabrys Ryan, Milenkovic Olgica IEEE Transactions on Information Theory . 2019,第12期

机译：从多集子字符串谱中唯一重建编码字符串
4. Common Substrings in Random Strings [C] . Eric Blais, Mathieu Blanchette Annual Symposium on Combinatorial Pattern Matching . 2006

机译：随机字符串中的公共子字符串
5. Discovering motifs in DNA and protein sequences: The approximate common substring problem. [D] . Bailey, Timothy Lawrence. 1995

机译：在DNA和蛋白质序列中发现基序：近似的常见子串问题。
6. An Efficient Rank Based Approach for Closest String and Closest Substring [O] . Liviu P. Dinu, Radu Ionescu 2009

机译：基于有效等级的最接近字符串和最接近子字符串的方法
7. Efficient representation and parallel computation of string-substring longest common subsequences [O] . Tiskin A. 2006

机译：字符串-子字符串最长公共子序列的有效表示和并行计算

Common Substrings in Random Strings

摘要

著录项

相似文献

相关主题

期刊订阅