Compression and Predictive Distributions for Large Alphabets.

机译：大字母的压缩和预测分布。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data generated from large alphabet exist almost everywhere in our life, for example, texts, images and videos. Traditional universal compression algorithms mostly involve small alphabets and assume implicitly an asymptotic condition under which the extra bits induced in the compression process vanishes as an infinite number of data come. In this thesis, we put the main focus on compression and prediction for large alphabets with the alphabet size comparable or larger than the sample size.;We first consider sequences of random variables independent and identically generated from a large alphabet. In particular, the size of the sample is allowed to be variable. A product distribution based on Poisson sampling and tiling is proposed as the coding distribution, which highly simplifies the implementation and analysis through independence. Moreover, we characterize the behavior of the coding distribution through a condition on the tail sum of the ordered counts, and apply it to sequences satisfying this condition. Further, we apply this method to envelope classes. This coding distribution provides a convenient method to approximately compute the Shtarkov's normalized maximum likelihood (NML) distribution. And the extra price paid for this convenience is small compared to the total cost. Furthermore, we find this coding distribution can also be used to calculate the NML distribution exactly. And this calculation remains simple due to the independence of the coding distribution.;Further, we consider a more realistic class---the Markov class, and in particular, tree sources. A context tree based algorithm is designed to describe the dependencies among the contexts. It is a greedy algorithm which seeks for the greatest savings in codelength when constructing the tree. Compression and prediction of individual counts associated with the contexts uses the same coding distribution as in the i.i.d case. Combining these two procedures, we demonstrate a compression algorithm based on the tree model.;Results of simulation and real data experiments for both the i.i.d model and Markov model have been included to illustrate the performance of the proposed algorithm.

机译：大字母生成的数据几乎存在于我们生活中的所有地方，例如文本，图像和视频。传统的通用压缩算法主要涉及小字母，并隐式假定一种渐近条件，在这种条件下，随着无数数据的到来，压缩过程中引起的额外位将消失。在本文中，我们主要关注与字母大小可比或大于样本大小的大字母的压缩和预测。我们首先考虑独立变量并从大字母相同地生成的随机变量序列。特别地，样本的大小被允许是可变的。提出了一种基于泊松采样和平铺的产品分布作为编码分布，通过独立性极大地简化了实现和分析。此外，我们通过关于有序计数的尾和的条件来表征编码分布的行为，并将其应用于满足该条件的序列。此外，我们将此方法应用于信封类。此编码分布提供了一种方便的方法来近似计算Shtarkov的归一化最大似然（NML）分布。而且，为此便利所支付的额外价格与总成本相比很小。此外，我们发现此编码分布也可用于精确计算NML分布。并且由于编码分布的独立性，该计算仍然很简单。此外，我们考虑了更现实的类-马尔可夫类，尤其是树源。设计基于上下文树的算法来描述上下文之间的依赖关系。这是一种贪心算法，在构造树时会寻求最大的代码长度节省。与上下文相关联的单个计数的压缩和预测使用与在i.d情况下相同的编码分布。结合这两个过程，我们演示了一种基于树模型的压缩算法。包括i.i.d模型和Markov模型的仿真和实际数据实验结果，以说明该算法的性能。

著录项

作者
Yang, Xiao.;
展开▼
作者单位

Yale University.;

展开▼
授予单位 Yale University.;
学科 Statistics.;Electrical engineering.
学位 Ph.D.
年度 2015
页码 107 p.
总页数 107
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Effect Of Particle Size Distribution And Packing Compression On Fluid Permeability As Predicted Lattice-boltzmann Simulations [J] . David Vidal, Cathy Ridgway, Gregoire Pianet, Computers & Chemical Engineering . 2009,第1期

机译：预测的Lattice-boltzmann模拟对粒度分布和填料压缩对流体渗透性的影响
2. Comparison of CPR outcome predictors between rhythmic abdominal compression and continuous chest compression CPR techniques [J] . KammeyerR.M., PargettM.S., RundellA.E. Emergency medicine journal: EMJ . 2014,第5期

机译：有规律的腹部按压和连续胸部按压CPR技术之间CPR预测指标的比较
3. Secondary Compression of Municipal Solid Wastes and a Compression Model for Predicting Settlement of Municipal Solid Waste Landfills [J] . Yunmin Chen, Han Ke, Delwyn G. Fredlund, Journal of geotechnical and geoenvironmental engineering . 2010,第5期

机译：城市生活垃圾的二次压缩和城市生活垃圾填埋场沉降预测的压缩模型
4. Compression and predictive distributions for large alphabet i.i.d and Markov models [C] . Yang Xiao, Barron Andrew R. Institute of Electrical and Electronics Engineers International Symposium on Information Theory . 2014

机译：大字母i.i.d和Markov模型的压缩和预测分布
5. Universal compression and probability estimation over unknown alphabets. [D] . Zhang, Junan. 2005

机译：未知字母的通用压缩和概率估计。
6. Meniscus and Cartilage Exhibit Distinct Intra-Tissue Strain Distributions Under Unconfined Compression [O] . Janice H. Lai, Marc E. Levenston -1

机译：弯月面和软骨在不合适的压缩下表现出不同的内部组织应变分布
7. 1Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting [O] . Xiao Yang, Student Member, Ieee Andrew, 2016

机译：1通过泊松和倾斜进行大字母压缩和预测分布

Compression and Predictive Distributions for Large Alphabets.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅