首页> 外文会议>Design and analysis of algorithms >Faster Variance Computation for Patterns with Gaps
【24h】

Faster Variance Computation for Patterns with Gaps

机译:有间隙的图案的方差计算更快

获取原文
获取原文并翻译 | 示例

摘要

Determining whether a pattern is statistically overrepresented or underrepresented in a string is a fundamental primitive in computational biology and in large-scale text mining. We study ways to speed up the computation of the expectation and variance of the number of occurrences of a pattern with rigid gaps in a random string. Our contributions are twofold: first, we focus on patterns in which groups of characters from an alphabet Σ can occur at each position. We describe a way to compute the exact expectation and variance of the number of occurrences of a pattern w in a random string generated by a Markov chain in O(|w|~2) time, improving a previous result that required 0(2~(|w|)) time. We then consider the problem of computing expectation and variance of the motifs of a string s in an IID text. Motifs are rigid gapped patterns that occur at least twice in s, and in which at most one character from E occurs at each position. We study the case in which s is given offline, and an arbitrary motif w of s is queried online. We relate computational complexity to the structure of w and s, identifying sets of motifs that are amenable to o(w log |w|) time online computation after O(|s|~3) preprocessing of s. Our algorithms lend themselves to efficient implementations.
机译:确定模式在字符串中在统计上是否代表过多或不足,是计算生物学和大规模文本挖掘中的基本基本知识。我们研究了加快随机串中具有刚性间隙的模式的出现次数的期望值和方差的计算方法。我们的贡献是双重的:首先,我们关注模式,在这种模式中,字母Σ的字符组可以出现在每个位置。我们描述了一种方法,可以计算在O(| w |〜2)时间内由马尔可夫链生成的随机字符串中模式w的出现次数的确切期望和方差,从而改进了以前需要0(2〜 (| w |))时间。然后,我们考虑计算IID文本中字符串s的图案的期望和方差的问题。母题是刚性的有间隙的图案,在s中至少发生两次,并且每个位置最多出现一个来自E的字符。我们研究了s脱机给定,并且在线查询s的任意主题w的情况。我们将计算复杂度与w和s的结构相关联,确定适合s的O(| s |〜3)的o(w log | w |)时间在线计算的图案集。我们的算法适用于高效的实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号