【24h】

Sequence-Patterns Entropy and Infinite Alphabets

机译:序列模式熵和无限字母表

获取原文

摘要

The entropy of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown large, possibly in nite, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive integer indices in increasing order of rst occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. We extend our previous upper bounds on the entropy of patterns generated by a bounded alphabet to unbounded, possibly in nite, alphabets. Unlike the bounded case, we now allow alphabets with symbols that occur with both high and very low probabilities. We study the effect of all very low probability letters on the pattern entropy. All the low probability letters are collapsed into one symbol. Beyond the contribution of that symbol to the entropy, and unlike i.i.d. sequences, the additional contribution to the entropy of patterns of length n of all letters with probability 1/n~(1+ε) or smaller, for some arbitrarily small ε, is shown to be of o(n) over the whole sequence (and o(1) per symbol). The same contribution of all letters with probability 1/n~(2+ε) or smaller is shown to be o(1) for the whole sequence. If an i.i.d. source with an in nite alphabet has only letters with probability 1/n~(2+ε), the entropy of its patterns approaches zero, i.e., the only likely pattern is the pattern 123... n. This is in contrast to the i.i.d. entropy that is super-linear in n. The results are derived through a design of a low-complexity sequential coding method for patterns that achieves the upper bound.
机译:研究了通过独立相同地分布(I.I.D.)来源产生的序列模式的熵,其中可能在含有液体中的未知字母表中。模式是一系列索引,其中包含越来越多的Integer indectrence的所有连续整数索引。如果生成序列的源的字母表是未知的,则可以利用编码未知字母符号的不可避免的成本来创建序列的模式。这种模式又可以自身压缩。我们将先前的上限扩展到由有界字母表生成的模式熵,以无限地,可能在NITE,字母表中。与界定的情况不同,我们现在允许具有高概率和非常低的符号的字母表。我们研究了所有非常低概率字母对模式熵的影响。所有低概率字母都折叠成一个符号。超出该符号对熵的贡献,而不是i.i.d.序列,对于一些具有概率1 / n〜(1 +ε)或更小的概率的长度N的熵的额外贡献显示在整个序列上(和o(1)每个符号)。对于整个序列,显示了具有概率1 / n〜(2 +ε)或更小的所有字母的相同贡献是O(1)。如果是I.I.D.具有液位字母表的源仅具有概率1 / n〜(2 +ε)的字母,其图案的熵接近零,即,唯一可能的模式是模式123 ... n。这与i.i.d。熵在n中是超线性的。结果是通过设计用于实现上限的模式的低复杂性顺序编码方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号