首页> 外国专利> METHOD OF COMPRESSING INFORMATION AND AN APPARATUS FOR COMPRESSING ENGLISH TEXT

METHOD OF COMPRESSING INFORMATION AND AN APPARATUS FOR COMPRESSING ENGLISH TEXT

机译:信息压缩方法和英语文本压缩装置

摘要

There is described a method of storing alphanumeric text in a random access memory or disk file where the text is received as a string of ASCII coded characters that are then separated into tokens which may be either groups of alpha characters, numeric characters or punctuation characters. Each alpha token is encoded by comparing the token with a table of words stored in a global dictionary. If present in the dictionary, the token is stored in the memory as two or three four-bit nibbles which identify its location in the dictionary. If not in the dictionary, the characters in the front of the token are compared with a list of word beginnings or prefixes. If a match is found, two nibbles are stored in the random access memory identifying the prefix, the prefix characters are stripped from the token, and the process is repeated. If no more prefixes are found, the end of the word is matched against a stored group of word endings or suffixes. Two nibbles are stored in the random access memory identifying the suffix, and the suffix characters are stripped from the token. The suffix matching is repeated on the remaining ending characters of the token. If there are no more identifiable suffixes, the number of characters remaining in the stem of the token is determined. After all letters are removed by identifying all suffixes, the remaining stem is encoded as one or two nibbles for each letter plus a nibble identifying the length of the stem. If no suffix was identified, a nibble is stored indicating that the stem is either four or five characters in length, or the actual length of the stem is stored as an additional nibble. After the stem length has been encoded, the individual letters of the stem are encoded by one or two nibbles for each letter as determined from tables of individual characters.
机译:描述了一种将字母数字文本存储在随机存取存储器或磁盘文件中的方法,其中该文本作为一串ASCII编码字符被接收,然后将其分成令牌,令牌可以是字母字符,数字字符或标点符号的组。通过将令牌与存储在全局词典中的单词表进行比较来对每个alpha令牌进行编码。如果在词典中存在令牌,则令牌将作为两个或三个四位半字节存储在内存中,以标识令牌在词典中的位置。如果不在词典中,则将令牌前面的字符与单词开头或前缀列表进行比较。如果找到匹配项,则将两个半字节存储在标识前缀的随机存取存储器中,从令牌中除去前缀字符,然后重复该过程。如果找不到更多的前缀,则将单词的结尾与一组存储的单词结尾或后缀匹配。两个半字节存储在标识后缀的随机存取存储器中,并且后缀字符从令牌中删除。在令牌的其余结尾字符上重复进行后缀匹配。如果没有更多可识别的后缀,则确定令牌词干中剩余的字符数。通过标识所有后缀除去所有字母后,剩余的茎被编码为每个字母一个或两个半字节,再加上一个标识茎长度的半字节。如果未标识后缀,则存储一个半字节,表明该茎的长度为四个或五个字符,或者将茎的实际长度存储为另一个半字节。在对茎长度进行编码之后,对于茎中的各个字母,根据从各个字符表中确定的每个字母,用一个或两个半字节对其进行编码。

著录项

  • 公开/公告号DE3277556D1

    专利类型

  • 公开/公告日1987-12-03

    原文格式PDF

  • 申请/专利权人 SYSTEM DEVELOPMENT CORPORATION;

    申请/专利号DE19823277556T

  • 发明设计人 SNOW CRAIG ADAM;

    申请日1982-09-25

  • 分类号G06F15/20;

  • 国家 DE

  • 入库时间 2022-08-22 06:52:47

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号