首页> 外文学位 >Compression and Predictive Distributions for Large Alphabets.
【24h】

Compression and Predictive Distributions for Large Alphabets.

机译:大字母的压缩和预测分布。

获取原文
获取原文并翻译 | 示例

摘要

Data generated from large alphabet exist almost everywhere in our life, for example, texts, images and videos. Traditional universal compression algorithms mostly involve small alphabets and assume implicitly an asymptotic condition under which the extra bits induced in the compression process vanishes as an infinite number of data come. In this thesis, we put the main focus on compression and prediction for large alphabets with the alphabet size comparable or larger than the sample size.;We first consider sequences of random variables independent and identically generated from a large alphabet. In particular, the size of the sample is allowed to be variable. A product distribution based on Poisson sampling and tiling is proposed as the coding distribution, which highly simplifies the implementation and analysis through independence. Moreover, we characterize the behavior of the coding distribution through a condition on the tail sum of the ordered counts, and apply it to sequences satisfying this condition. Further, we apply this method to envelope classes. This coding distribution provides a convenient method to approximately compute the Shtarkov's normalized maximum likelihood (NML) distribution. And the extra price paid for this convenience is small compared to the total cost. Furthermore, we find this coding distribution can also be used to calculate the NML distribution exactly. And this calculation remains simple due to the independence of the coding distribution.;Further, we consider a more realistic class---the Markov class, and in particular, tree sources. A context tree based algorithm is designed to describe the dependencies among the contexts. It is a greedy algorithm which seeks for the greatest savings in codelength when constructing the tree. Compression and prediction of individual counts associated with the contexts uses the same coding distribution as in the i.i.d case. Combining these two procedures, we demonstrate a compression algorithm based on the tree model.;Results of simulation and real data experiments for both the i.i.d model and Markov model have been included to illustrate the performance of the proposed algorithm.
机译:大字母生成的数据几乎存在于我们生活中的所有地方,例如文本,图像和视频。传统的通用压缩算法主要涉及小字母,并隐式假定一种渐近条件,在这种条件下,随着无数数据的到来,压缩过程中引起的额外位将消失。在本文中,我们主要关注与字母大小可比或大于样本大小的大字母的压缩和预测。我们首先考虑独立变量并从大字母相同地生成的随机变量序列。特别地,样本的大小被允许是可变的。提出了一种基于泊松采样和平铺的产品分布作为编码分布,通过独立性极大地简化了实现和分析。此外,我们通过关于有序计数的尾和的条件来表征编码分布的行为,并将其应用于满足该条件的序列。此外,我们将此方法应用于信封类。此编码分布提供了一种方便的方法来近似计算Shtarkov的归一化最大似然(NML)分布。而且,为此便利所支付的额外价格与总成本相比很小。此外,我们发现此编码分布也可用于精确计算NML分布。并且由于编码分布的独立性,该计算仍然很简单。此外,我们考虑了更现实的类-马尔可夫类,尤其是树源。设计基于上下文树的算法来描述上下文之间的依赖关系。这是一种贪心算法,在构造树时会寻求最大的代码长度节省。与上下文相关联的单个计数的压缩和预测使用与在i.d情况下相同的编码分布。结合这两个过程,我们演示了一种基于树模型的压缩算法。包括i.i.d模型和Markov模型的仿真和实际数据实验结果,以说明该算法的性能。

著录项

  • 作者

    Yang, Xiao.;

  • 作者单位

    Yale University.;

  • 授予单位 Yale University.;
  • 学科 Statistics.;Electrical engineering.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号