...
首页> 外文期刊>Methodology and computing in applied probability >A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences
【24h】

A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences

机译:逼近多个随机序列中常见单词匹配概率的新方法

获取原文
获取原文并翻译 | 示例

摘要

In this paper we consider R independent sequences of length T formed by independent, not necessarily uniformly distributed letters drawn from a finite alphabet. We first develop a new and efficient method of calculating the expectation E(N_R) = E(N_R(m,T)) of the number of distinct words of length m, N_R(m, T), which are common to R such sequences. We then consider the case of four uniformly distributed letters. We determine a b_R = b_R(m, T) ≥ 0 such that the interval [E(N_R) - b_R; E(N_R)] contains the probability p_R = ?(N_R ≥ 1) that there exists a word of length m common to the R sequences. We show that b_R ≈ 0.07E(N_R) if R = 3 and b_R ≤ 0.05 E(N_R) if R ≥ 4. Thus, for unusual common words, i. e. such that p_R is small, E(N_R) provides a very accurate approximation of this probability. We then compare numerically the intervals E(N_R)-b_R},E(N_R)] with former approximations of p_R provided by Karlin and Ost (Ann Probab 16:535-563, 1988) and Naus and Sheng (Bull Math Biol 59(3):483-495, 1997).
机译:在本文中,我们考虑由有限字母表中的独立(不一定是均匀分布)字母形成的长度为T的R个独立序列。我们首先开发一种新的有效方法来计算期望值E(N_R)= E(N_R(m,T))的长度为m的不同单词的数目N_R(m,T),这对于R这样的序列很常见。然后,我们考虑四个均匀分布的字母的情况。我们确定b_R = b_R(m,T)≥0,使得区间[E(N_R)-b_R; E(N_R)]包含概率p_R =?(N_R≥1),其中存在R个序列共有的长度为m的单词。我们证明,如果R = 3,则b_R≈0.07E(N_R);如果R≥4,则b_R≤0.05 E(N_R)。 e。为了使p_R小,E(N_R)提供了该概率的非常精确的近似值。然后,我们将间隔E(N_R)-b_R},E(N_R)]与Karlin和Ost(Ann Probab 16:535-563,1988)和Naus和Sheng(Bull Math Biol 59(1988) 3):483-495,1997)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号