【24h】

Statistical Properties of Ordered Alphabetical Coding

机译:有序字母编码的统计特性

获取原文

摘要

The paper presents a type of text coding, called αβ-coding. The essence of αβ-coding is that letters of every word of a given text are arranged in a specific way to create a code of that word. List of codes obtained by scanning text corpora is stored in a database together with words that could be transformed into each code. Word frequencies are stored as well. Decoding is performed by transforming possibly scrambled words according to the algorithm of the coding and finding in the database the most frequent word corresponding to the resulting code. As more than one word may result in the same code, decoding is inherently ambiguous. However a study on corpora of five languages has shown that about 95% of word-tokens can be correctly decoded.
机译:本文提出了一种文本编码,称为αβ编码。 αβ编码的本质是给定文本的每个单词的字母以特定方式排列,以创建该词的代码。 通过扫描文本语料库获得的代码列表将与可以转换为每个代码的单词一起存储在数据库中。 单词频率也存储。 通过根据编码的算法转换可能扰乱的单词来执行解码,并在数据库中查找到与结果代码相对应的最常用的单词。 随着多个单词可能导致相同的代码,解码本质上是模糊的。 然而,关于五种语言的研究表明,可以正确地解码大约95%的单词令牌。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号