首页> 外文期刊>ACM Transactions on Information Systems >Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval
【24h】

Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval

机译:十亿规模微博检索的紧凑索引和明智搜索

获取原文
获取原文并翻译 | 示例

摘要

In this article, we study the problem of efficient top-k disjunctive query processing in a huge microblog dataset. In terms of compact indexing, we categorize the keywords into rare terms and common terms based on inverse document frequency (idf) and propose tailored block-oriented organization to save memory consumption. In terms of fast searching, we classify the queries into three types based on term category and judiciously design an efficient search algorithm for each type. We conducted extensive experiments on a billion-scale Twitter dataset and examined the performance with both simple and more advanced ranking functions. The results showed that with much smaller index size, our search algorithm achieves a factor of 2-3 times faster speedup over state-of-the-art solutions in both ranking scenarios.
机译:在本文中,我们研究了巨大的微博客数据集中有效的top-k析取查询处理问题。在紧凑索引方面,我们基于逆文档频率(idf)将关键字分为稀有术语和常见术语,并提出了量身定制的面向块的组织,以节省内存消耗。在快速搜索方面,我们根据术语类别将查询分为三种类型,并明智地为每种类型设计一种有效的搜索算法。我们在十亿规模的Twitter数据集上进行了广泛的实验,并使用简单和更高级的排名功能检查了效果。结果表明,在两种排名情况下,我们的搜索算法的索引大小都小得多,因此其加速速度是最新解决方案的2-3倍。

著录项

  • 来源
    《ACM Transactions on Information Systems》 |2017年第3期|27.1-27.24|共24页
  • 作者单位

    Univ Elect Sci & Technol China, Chengdu, Peoples R China|Qingshuihe Campus 2006,Xiyuan Ave, Chengdu, Sichuan, Peoples R China;

    Shandong Univ, Jinan, Peoples R China|High Tech Ind Dev Zone, 1500 Sunhua Lu, Jinan, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China;

    Natl Univ Singapore, Sch Comp, Singapore, Singapore;

    Natl Univ Singapore, Sch Comp, Singapore, Singapore;

    Univ Elect Sci & Technol China, Chengdu, Peoples R China|Qingshuihe Campus 2006,Xiyuan Ave, Chengdu, Sichuan, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Top-k; disjunctive keyword search; microblg; billion-scale;

    机译:top-k;析取关键词搜索;microblg;十亿规模;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号