首页> 外文会议>International Symposium on Fundamentals of Computation Theory >Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching
【24h】

Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching

机译:在滑动窗口和应用程序中最小的缺席单词,以便在线模式匹配

获取原文

摘要

An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111-117, 1998; Belazzougui et al. in ESA 8125:133-144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756-1768, 2000, Ota and Morita in Theoret Comput Sci 526:108-119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109-116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421-2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is O(σ|y|), where σ is the size of the alphabet. Along the way, we show an O(σ|y|)-time and O(σ|x|)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words.
机译:单词y的缺席(或禁止)字是一个在y中不会发生的单词。然后,如果在y中出现所有适当的因素,则称为最小。存在线性时间和线性空间算法,用于计算y的所有最小缺陷字(Crochemore等人。在Inf Process Lett 67:111-117,1998; Belazzougui等人。在ESA 8125:133-144,2013; Barton等人。在BMC生物键形15:388,2014中)。最小的缺席词用于数据压缩(Crochemore等人。在Proc Ieee 88:1756-1768,2000,Ota和Morita在Theoret Comput SCI 526:108-119,2014)和通过利用度量来进行对齐序列比较基于最小的缺席词(Theoret Comput SCI 450:109-116,2012)的基础上它们也用于分子生物学;例如,发现人类基因组的三个最小缺失词在埃博拉病毒基因组(Silva等人。在生物信息学31:2421-2425,2015)中起作用在埃博拉病毒基因组的编码区中的功能作用。在本文中,我们介绍了在线模式匹配的最小缺席单词的新应用。具体地,我们介绍了一种算法,给定图案x和文本y,计算x和每个大小窗口之间的距离| x |在y。运行时间是O(σ|),其中σ是字母表的大小。沿途,我们显示O(σ| y |) - time和O(σ| x |) - 空间算法来计算每个大小的每个窗口的最小缺点单词| x |在Y,与一些新的组合洞察最小的缺席。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号