A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences

Haiman G.; Preda C.

首页> 外文期刊>Methodology and computing in applied probability >A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences

【24h】

A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences

机译：逼近多个随机序列中常见单词匹配概率的新方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we consider R independent sequences of length T formed by independent, not necessarily uniformly distributed letters drawn from a finite alphabet. We first develop a new and efficient method of calculating the expectation E(N_R) = E(N_R(m,T)) of the number of distinct words of length m, N_R(m, T), which are common to R such sequences. We then consider the case of four uniformly distributed letters. We determine a b_R = b_R(m, T) ≥ 0 such that the interval [E(N_R) - b_R; E(N_R)] contains the probability p_R = ?(N_R ≥ 1) that there exists a word of length m common to the R sequences. We show that b_R ≈ 0.07E(N_R) if R = 3 and b_R ≤ 0.05 E(N_R) if R ≥ 4. Thus, for unusual common words, i. e. such that p_R is small, E(N_R) provides a very accurate approximation of this probability. We then compare numerically the intervals E(N_R)-b_R},E(N_R)] with former approximations of p_R provided by Karlin and Ost (Ann Probab 16:535-563, 1988) and Naus and Sheng (Bull Math Biol 59(3):483-495, 1997).

机译：在本文中，我们考虑由有限字母表中的独立（不一定是均匀分布）字母形成的长度为T的R个独立序列。我们首先开发一种新的有效方法来计算期望值E（N_R）= E（N_R（m，T））的长度为m的不同单词的数目N_R（m，T），这对于R这样的序列很常见。然后，我们考虑四个均匀分布的字母的情况。我们确定b_R = b_R（m，T）≥0，使得区间[E（N_R）-b_R; E（N_R）]包含概率p_R =？（N_R≥1），其中存在R个序列共有的长度为m的单词。我们证明，如果R = 3，则b_R≈0.07E（N_R）;如果R≥4，则b_R≤0.05 E（N_R）。 e。为了使p_R小，E（N_R）提供了该概率的非常精确的近似值。然后，我们将间隔E（N_R）-b_R}，E（N_R）]与Karlin和Ost（Ann Probab 16：535-563，1988）和Naus和Sheng（Bull Math Biol 59（1988） 3）：483-495，1997）。

著录项

来源
《Methodology and computing in applied probability 》 |2010年第4期| 共21页
作者
Haiman G.; Preda C.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用数学 ;
关键词
1-dependent sequences; Genetic sequences; Longest success run; Matching a common word; Poisson approximation;

机译：1依存序列;遗传序列;最长成功运行;匹配一个普通单词;泊松近似;

相似文献

外文文献
中文文献
专利

1. A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences [J] . Haiman G., Preda C. Methodology and computing in applied probability . 2010 ,第4期

机译：逼近多个随机序列中常见单词匹配概率的新方法
2. Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences [J] . Sylvain Forêt, Miriam R Kantorovitz, Conrad J Burden BMC Bioinformatics . 2006 ,第SUPPLEMENTa5期

机译：随机序列之间的精确和近似单词的渐近行为和最佳词大小
3. Applying Agrep to r-NSA to solve multiple sequences approximate matching [J] . Bing Ni, Man-Hon Wong, Chi-Fai David Lam, International journal of data mining and bioinformatics . 2014 ,第4期

机译：将Agrep应用于r-NSA以解决多个序列的近似匹配
4. Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment [C] . Yusuke Yasuda, Xin Wang, Junichi Yamagishi IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：概率分布，随机性和搜索方法的选择对使用硬对齐的序列到序列文本到语音合成中的序列建模的影响
5. Statistical Methods for Aggregation of Sequence Data and Multiple Testing Correction in Common and Rare Variant Analysis [D] . Chen, Zhongsheng. 2020

机译：常见和罕见变体分析中序列数据聚合和多次测试校正的统计方法
6. Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences [O] . Sylvain Forêt, Miriam R Kantorovitz, Conrad J Burden 2006

机译：随机序列之间精确和近似单词匹配的渐近行为和最佳单词大小
7. Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences [O] . Kantorovitz Miriam R, Forêt Sylvain, Burden Conrad J 2006

机译：随机序列之间精确和近似单词匹配的渐近行为和最佳单词大小
8. Development of Randomized Load Sequences with Transition Probabilities Based on a Markov Process [R] . Heller, R. A., Shinozuka, M. 1964

机译：基于马尔可夫过程的具有转移概率的随机负荷序列的开发

A New Method of Approximating the Probability of Matching Common Words in Multiple Random Sequences

摘要

著录项

相似文献

相关主题

期刊订阅