首页> 外文期刊>Algorithmica >Lightweight Data Indexing and Compression in External Memory
【24h】

Lightweight Data Indexing and Compression in External Memory

机译:外部存储器中的轻量级数据索引和压缩

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt) and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of working space on disk while all previous approaches use Θ(n logn) bits. This is achieved by building the bwt directly without passing through the construction of the Suffix Array/Tree data structure. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the bwt that uses Θ(n) bits of working space, and a lightweight internal-memory algorithm for computing the bwt which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the bwt via sequential scans in terms of the classic product: internal-memory space × number of passes over the disk data, showing that our algorithms are within an O(logn) factor of the optimal.
机译:在本文中,我们描述了用于计算Burrows-Wheeler变换(bwt)和在外部存储器中建立(压缩)索引的算法。我们算法的创新功能是轻量级,即对于大小为n的输入,它们仅使用磁盘上的n位工作空间,而所有先前的方法都使用Θ(n logn)位。这是通过直接构建bwt而无需经过后缀数组/树数据结构的构建来实现的。此外,我们的算法仅通过顺序扫描访问磁盘数据,因此它们充分利用了现代磁盘功能,这些功能使顺序磁盘访问比随机访问快得多。我们还提出了一种基于扫描的算法来反转使用Θ(n)位工作空间的bwt,以及一种轻量级的内部内存算法来计算bwt,当可用工作空间为o(n时,这是文献中最快的)位。最后,对于经典产品,我们通过顺序扫描证明了计算和反转bwt的复杂性的下限:内部存储器空间×磁盘数据的通过次数,这表明我们的算法在O(logn)因子之内最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号