首页> 外文期刊>Information retrieval >Implicit indexing of natural language text by reorganizing bytecodes
【24h】

Implicit indexing of natural language text by reorganizing bytecodes

机译:通过重组字节码对自然语言文本进行隐式索引

获取原文
获取原文并翻译 | 示例
           

摘要

Word-based byte-oriented compression has succeeded on large natural language text databases, by providing competitive compression ratios, fast random access, and direct sequential searching. We show that by just rearranging the target symbols of the compressed text into a tree-shaped structure, and using negligible additional space, we obtain a new implicitly indexed representation of the compressed text, where search times are drastically improved. The occurrences of a word can be listed directly, without any text scanning, and in general any inverted-index-like capability, such as efficient phrase searches, can be emulated without storing any inverted list information. We experimentally show that our proposal performs not only much more efficiently than sequential searches over compressed text, but also than explicit inverted indexes and other types of indexes, when using little extra space. Our representation is especially successful when searching for single words and short phrases.
机译:通过提供竞争性的压缩率,快速的随机访问和直接的顺序搜索,基于单词的基于字节的压缩已在大型自然语言文本数据库上获得了成功。我们展示了通过将压缩文本的目标符号重新排列为树形结构,并使用可忽略的附加空间,我们获得了压缩文本的新的隐式索引表示形式,其中搜索时间得到了显着改善。单词的出现可以直接列出,而无需进行任何文本扫描,并且通常可以模拟任何类似反向索引的功能,例如有效的短语搜索,而无需存储任何反向列表信息。我们通过实验表明,该建议在使用很少的额外空间时,不仅比对压缩文本进行顺序搜索更有效,而且比显式倒排索引和其他类型的索引更有效。当搜索单个单词和简短短语时,我们的表示特别成功。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号