首页> 外文会议>International Symposium on Fundamentals of Computation Theory >Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching
【24h】

Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching

机译:滑动窗口中的最小缺席单词及其在在线模式匹配中的应用

获取原文

摘要

An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111-117, 1998; Belazzougui et al. in ESA 8125:133-144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756-1768, 2000, Ota and Morita in Theoret Comput Sci 526:108-119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109-116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421-2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is O(σ|y|), where σ is the size of the alphabet. Along the way, we show an O(σ|y|)-time and O(σ|x|)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words.
机译:单词y中缺少(或禁止)的单词是在y中不出现的单词。如果所有适当的因素都在y中发生,则称为极小值。存在用于计算y的所有最小缺失词的线性时间和线性空间算法(Crochemore等人,Inf Process Lett 67:111-117,1998; Belazzougui等人,ESA 8125:133-144,2013; Barton等人在BMC Bioinform 15:388,2014中)。最小的缺席单词用于数据压缩(Crochemore等人,Proc IEEE 88:1756-1768,2000,Ota and Morita在Theoret Comput Sci 526:108-119,2014)和通过使用度量标准进行无比对序列比较基于最小的缺席单词(Chairungsee and Crochemore in Theoret Comput Sci 450:109-116,2012)。它们还用于分子生物学。例如,发现人类基因组的三个最小缺失词在埃博拉病毒基因组的编码区中发挥功能性作用(Silva et al。in Bioinformatics 31:2421-2425,2015)。在本文中,我们介绍了一种用于在线模式匹配的最小缺席单词的新应用程序。具体来说,我们提供一种算法,给定模式x和文本y,可计算x与大小为| x |的每个窗口之间的距离。在y上。运行时间为O(σ| y |),其中σ是字母的大小。在此过程中,我们展示了O(σ| y |)时间和O(σ| x |)-空间算法来计算大小为| x |的每个窗口的最小缺席单词在y上,以及对最小缺席单词的一些新的组合见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号