...
首页> 外文期刊>Theoretical computer science >On overabundant words and their application to biological sequence analysis
【24h】

On overabundant words and their application to biological sequence analysis

机译:过多的单词及其在生物序列分析中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The observed frequency of the longest proper prefix, the longest proper suffix, and the longest infix of a word w in a given sequence x can be used for classifying w as avoided or overabundant. The definitions used for the expectation and deviation of w in this statistical model were described and biologically justified by Brendel et al. (1986) [1]. We have very recently introduced a time-optimal algorithm for computing all avoided words of a given sequence over an integer alphabet (2017) [2]. In this article, we extend this study by presenting an O(n)-time and O(n)-space algorithm for computing all overabundant words in a sequence x of length n over an integer alphabet. Our main result is based on a new non-trivial combinatorial property of the suffix tree tau of x: the number of distinct factors of x whose longest infix is the label of an explicit node of tau is no more than 3n - 4. We further show that the presented algorithm is time-optimal by proving that O(n) is a tight upper bound for the number of overabundant words. Finally, we present experimental results, using both synthetic and real data, which justify the effectiveness and efficiency of our approach in practical terms. (C) 2018 Elsevier B.V. All rights reserved.
机译:在给定序列x中的最长适当前缀,最长的适当后缀和单词W的最长infix的观察到的频率可用于分类W,以避免或过多。描述了在该统计模型中的期望和偏离W的定义,并通过Brendel等人进行了生物学证明。 (1986)[1]。我们最近引入了一个时间最佳算法,用于计算整数字母(2017)上的所有避免给定序列的避免单词[2]。在本文中,我们通过呈现O(n)-time和O(n) - 空间算法来扩展本研究,用于在整数字母表中计算长度N的序列X中的所有过遍的单词。我们的主要结果是基于X的后缀树Tau的新的非琐碎组合属性:X的不同因素的数量,其最长的infix是tau的明确节点的标签不超过3n - 4.我们进一步表明,通过证明O(n)是超冗余的单词数量的紧密上限,所示的算法是时间最佳的。最后,我们使用合成和实际数据呈现实验结果,这些数据在实际术语中证明了我们方法的有效性和效率。 (c)2018年elestvier b.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号