首页> 外文期刊>Information Theory, IEEE Transactions on >On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
【24h】

On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

机译:基于语法的代码词汇与文本的逻辑一致性

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a new interpretation for Zipf–Mandelbrot's law in natural language which rests on two areas of information theory. Firstly, we construct a new class of grammar-based codes and, secondly, we investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to prove a proposition with a simple informal statement: If a text of length $n$ describes $n^{beta} $ independent facts in a repetitive way then the text contains at least $n^{beta} /log n$ different words, under suitable conditions on $n$. In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way.
机译:本文提出了自然语言中的Zipf–Mandelbrot定律的新解释,它基于信息论的两个领域。首先,我们构造了一类新的基于语法的代码,其次,我们研究了强非遍历平稳过程的性质。联合讨论的动机是用一个简单的非正式声明来证明一个命题:如果长度为 $ n $ 的文本描述 $ n ^ {beta} $ 独立事实,然后文本至少包含 $ n ^ {beta} / log n $ 不同的词,在适当的条件下在 $ n $ 。在正式声明中,采用了两个建模假设。首先,单词被理解为文本的最短基于语法的编码的非终结符。其次,假定文本是由有限能量的高度非遍历性源发出的,而事实是可以以平移不变的方式预测的二进制IID变量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号