首页> 外文学位 >Universal compression and probability estimation over unknown alphabets.
【24h】

Universal compression and probability estimation over unknown alphabets.

机译:未知字母的通用压缩和概率估计。

获取原文
获取原文并翻译 | 示例

摘要

We consider the problems of compressing i.i.d. sequences and estimating the underlying distribution over unknown alphabets. A sequence can be described by separately conveying its symbols, and its pattern ---the order in which the symbols appear. We show that the information contained in the pattern grows in the same asymptotic rate as in the sequence. But the worst-case per-symbol redundancy of compressing the patterns and sequence are very different. While it is known that the second one grows to infinity as the alphabet size grows, here we show the first one diminishes to zero. More precisely, the patterns of i.i.d. sequences over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time.; Employing the maximum-likelihood principle, we look for the high-profile distributions that maximize the profile probability of the observed data. We derive local maxima conditions for the distributions. We design several practical algorithms to estimate profile probability. We use the Monte Carlo EM algorithm to find the high-profile distributions. Experiments show the estimators based on the high-profile distributions perform well in estimating several statistics.
机译:我们考虑压缩i.i.d的问题。序列并估计未知字母的基础分布。可以通过分别传达其符号和图案(即符号出现的顺序)来描述序列。我们表明,模式中包含的信息以与序列中相同的渐近速率增长。但是压缩模式和序列的最坏情况下的每个符号冗余是非常不同的。虽然已知第二个字母会随着字母大小的增长而增长到无穷大,但在这里我们显示第一个字母会减小为零。更确切地说,i.d。的模式整个序列,包括无限甚至未知的字母,都可以在块和顺序上以减少冗余的方式进行压缩,并且压缩可以在线性时间内执行。利用最大似然原理,我们寻找使观测数据的分布概率最大的高分布。我们导出分布的局部最大值条件。我们设计了几种实用的算法来估计轮廓概率。我们使用蒙特卡洛EM算法来查找高分布。实验表明,基于高分布的估计量在估计多个统计量方面表现良好。

著录项

  • 作者

    Zhang, Junan.;

  • 作者单位

    University of California, San Diego.;

  • 授予单位 University of California, San Diego.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 102 p.
  • 总页数 102
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号