首页> 外文会议>International Conference on Current Trends in Theory andractice of Computer Science >Novel Results on the Number of Runs of the Burrows-Wheeler-Transform
【24h】

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

机译:新颖的挖掘机运行数量 - 轮车 - 变换的数量

获取原文

摘要

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as r) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the r parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures. In this paper, we investigate ρ(ν), the ratio of r and of the number of runs of the BWT of the reverse of ν. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as ρ(ν) = O(log~2(n)), for any string v of length n. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which ρ(ν) = Θ(log n) holds, thus giving the first non-trivial lower bound on ρ(n), the maximum over all strings of length n. Our results suggest that r is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string's combinatorial properties.
机译:挖掘机轮式变换(BWT),可逆字符串转换,是字符串处理中许多当前数据结构的基本组件之一。它是数据压缩中的核心,以及序列数据的有效查询算法,例如网页,基因组和其他生物序列,或者确实是任何文本数据。 BWT很好地压缩,因为它的相等字母运行(通常称为R)的数量通常远低于原始字符串的数量;特别是,它非常适合具有许多重复因素的字符串。事实上,由于重复性的衡量标准,已经向R参数支付了很多关注,特别是在压缩索引数据结构的空间和时间方面评估性能。在本文中,我们研究了ρ(ν),R和BWT的延伸的次数的比率。 Kempa和Kociumaka [Focs 2020]给出了第一个非平凡的上限为ρ(ν)= O(log〜2(n)),对于长度n的任何弦v。然而,关于这种上限的密封性,没有任何内容。我们存在ρ(ν)=θ(log n)保持的二进制串的无限系列,从而给出ρ(n)的第一个非微小界限,最大限度为ρ(n)。我们的结果表明,R不是弦重复性的理想衡量标准,因为串之间的重复因子数量是不变的。我们认为,BWT的运行数量和字符串组合属性之间存在更复杂的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号