...
首页> 外文期刊>Theoretical computer science >Balancing and clustering of words in the Burrows-Wheeler transform
【24h】

Balancing and clustering of words in the Burrows-Wheeler transform

机译:Burrows-Wheeler变换中单词的平衡和聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Compression algorithms based on Burrows-Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such "clustering effect" by using notions and methods from Combinatorics on Words. The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is). This hypothesis is here corroborated by experiments on "real" text, by using local entropy as a measure of the degree of balance of a word. In the setting of Combinatorics on Words, a sound confirmation of previous hypothesis is given by a result of Mantaci et al. (2003) [27], which states that, in the case of a binary alphabet, there is an equivalence between circularly balanced words, words having a clusterized BWT, and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence. The last section of the present paper is devoted to investigate the relationships between these notions, and other related ones (as, for instance, palindromic richness) in the case of a general alphabet.
机译:基于Burrows-Wheeler变换(BWT)的压缩算法利用了一个事实,即BWT的单词输出显示局部相似性,然后证明是高度可压缩的。本文的目的是通过使用组合词学的概念和方法来研究这种“聚类效应”。单词平衡的概念在我们的研究中起着核心作用。实证观察表明,平衡实际上是确保最佳BWT压缩的输入单词的组合属性。此外,可以合理地假设输入单词越平衡,我们在BWT之后拥有的局部相似度就越高(因此,压缩效果越好)。通过使用局部熵作为单词平衡度的度量,此假设在“真实”文本上的实验得到了证实。在单词组合学的设置中,Mantaci等人的结果给出了对先前假设的正确确认。 (2003)[27],其中指出,在二进制字母的情况下,圆形平衡词,具有聚类BWT的词和标准词的共轭词之间是等效的。如果大小大于2的字母不再等效。本文的最后一部分专门研究在通用字母的情况下这些概念与其他相关概念(例如回文丰富度)之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号