首页> 外文期刊>Software >Compact inverted index storage using general-purpose compression libraries
【24h】

Compact inverted index storage using general-purpose compression libraries

机译:使用通用压缩库的紧凑型倒排索引存储

获取原文
获取原文并翻译 | 示例
           

摘要

Efficient storage of large inverted indexes is one of the key technologies that support current web search services. Here we re-examine mechanisms for representing document-level inverted indexes and within-document term frequencies, including comparing specialized methods developed for this task against recent fast implementations of general-purpose adaptive compression techniques. Experiments with the Gov2-URL collection and a large collection of crawled news stories show that standard compression libraries can provide compression effectiveness as good as or better than previous methods, with decoding rates only moderately slower than reference implementations of those tailored approaches. This surprising outcome means that high-performance index compression can be achieved without requiring the use of specialized implementations.
机译:有效存储大的倒排索引是支持当前Web搜索服务的关键技术之一。在这里,我们重新检查用于表示文档级反向索引和文档内术语频率的机制,包括将针对该任务开发的专用方法与通用自适应压缩技术的近期快速实现进行比较。使用Gov2-URL集合和大量已爬网新闻故事的实验表明,标准压缩库可以提供比以前的方法更好或更高的压缩效果,并且解码速率仅比那些定制方法的参考实现慢一些。这一令人惊讶的结果意味着无需使用专门的实现就可以实现高性能索引压缩。

著录项

  • 来源
    《Software》 |2018年第4期|974-982|共9页
  • 作者单位

    Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic 3010, Australia;

    Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic 3010, Australia;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    index compression; inverted index; web search;

    机译:索引压缩;倒排索引;网页搜索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号