首页> 外文会议>String processing and information retrieval >Faster Algorithms for Sampling and Counting Biological Sequences
【24h】

Faster Algorithms for Sampling and Counting Biological Sequences

机译:更快的生物序列采样和计数算法

获取原文
获取原文并翻译 | 示例

摘要

A set of sequences S is pairwise bounded if the Hamming distance between any pair of sequences in S is at most 2d. The Consensus Sequence problem aims to discern between pairwise bounded sets that have a consensus, and if so, finding one such sequence s~*, and those that do not. This problem is closely related to the motif-recognition problem, which abstractly models finding important subsequences in biological data. We give an efficient algorithm for sampling pairwise bounded sets, referred to as MarkovSampling, and show it generates pairwise bounded sets uniformly at random. We illustrate the applicability of MarkovSampling to efficiently solving motif-recognition instances. Computing the expected number of motif sets has been a long-standing open problem in motif-recognition [1,3]. We consider the related problem of counting the number of pairwise bounded sets, give new bounds on number of pairwise bounded sets, and present an algorithmic approach to counting the number of pairwise bounded sets.
机译:如果S中任意一对序列之间的汉明距离最多为2d,则一组序列S是成对有界的。共识序列问题旨在区分具有共识的成对有界集,如果是,则找到一个这样的序列s〜*,以及那些没有的序列。这个问题与主题识别问题密切相关,后者是抽象模型,用于寻找生物学数据中的重要子序列。我们给出了一种有效的对成对有界集合进行采样的算法,称为MarkovSampling,并显示了该算法随机均匀地生成成对有界集合。我们说明了MarkovSampling在有效解决主题识别实例方面的适用性。计算主题集的预期数量一直是主题识别中一个长期存在的开放性问题[1,3]。我们考虑了计数成对有界集合的数量的相关问题,对成对有界集合的数量给出了新的界,并提出了一种计算成对有界集合的数量的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号