【24h】

Common Substrings in Random Strings

机译:随机字符串中的常见子字符串

获取原文
获取原文并翻译 | 示例

摘要

In computational biology, an important problem is to identify a word of length k present in each of a given set of sequences. Here, we investigate the problem of calculating the probability that such a word exists in a set of r random strings. Existing methods to approximate this probability are either inaccurate when r > 2 or are restricted to Bernoulli models. We introduce two new methods for computing this probability under Bernoulli and Markov models. We present generalizations of the methods to compute the probability of finding a word of length k shared among q of r sequences, and to allow mismatches. We show through simulations that our approximations are significantly more accurate than methods previously published.
机译:在计算生物学中,一个重要的问题是识别存在于给定序列集中的每个序列中的长度为k的单词。在这里,我们研究了计算一个单词存在于一组r个随机字符串中的概率的问题。当r> 2时,现有的近似此概率的方法要么不准确,要么仅限于伯努利模型。我们介绍了两种在Bernoulli和Markov模型下计算该概率的新方法。我们对这些方法进行了概括,以计算找到在r个序列的q个之间共享的,长度为k的单词的概率,并允许不匹配。通过仿真显示,我们的近似值比以前发布的方法准确得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号