【24h】

Smaller Self-indexes for Natural Language

机译:较小的自然语言自索引

获取原文

摘要

Self-indexes for natural-language texts, where these are regarded as token (word or separator) sequences, achieve very attractive space and search time. However, they suffer from a space penalty due to their large vocabulary. In this paper we show that by replacing the Huffman encoding they implicitly use by the slightly weaker Hu-Tucker encoding, which respects the lexical order of the vocabulary, both their space and time are improved.
机译:自然语言文本的自索引(被视为标记(单词或分隔符)序列)可实现非常诱人的空间和搜索时间。但是,由于词汇量大,它们会遭受空间惩罚。在本文中,我们证明了通过用稍弱的Hu-Tucker编码替换它们隐式使用的Huffman编码,这会尊重词汇的词汇顺序,从而改善了它们的空间和时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号