首页> 外文期刊>ACM Transactions on Information Systems >Fast Candidate Generation for Real-Time Tweet Search with Bloom Filter Chains
【24h】

Fast Candidate Generation for Real-Time Tweet Search with Bloom Filter Chains

机译:带有Bloom过滤器链的实时推文搜索的快速候选生成

获取原文
获取原文并翻译 | 示例

摘要

The rise of social media and other forms of user-generated content have created the demand for real-time search: against a high-velocity stream of incoming documents, users desire a list of relevant results at the time the query is issued. In the context of real-time search on tweets, this work explores candidate generation in a two-stage retrieval architecture where an initial list of results is processed by a second-stage rescorer to produce the final output. We introduce Bloom filter chains, a novel extension of Bloom filters that can dynamically expand to efficiently represent an arbitrarily long and growing list of monotonically-increasing integers with a constant false positive rate. Using a collection of Bloom filter chains, a novel approximate candidate generation algorithm called BWand is able to perform both conjunctive and disjunctive retrieval. Experiments show that our algorithm is many times faster than competitive baselines and that this increased performance does not require sacrificing end-to-end effectiveness. Our results empirically characterize the trade-off space defined by output quality, query evaluation speed, and memory footprint for this particular search architecture.
机译:社交媒体和其他形式的用户生成的内容的兴起引起了对实时搜索的需求:针对高速传入文档流,用户希望在发出查询时列出相关结果。在实时搜索推文的上下文中,这项工作探索了两阶段检索体系结构中的候选对象生成,其中初始结果列表由第二阶段记录员处理以产生最终输出。我们介绍了Bloom过滤器链,它是Bloom过滤器的新颖扩展,可以动态扩展以有效地表示任意长且不断增长的,具有恒定误报率的单调递增整数列表。使用Bloom过滤器链的集合,一种称为BWand的新颖的近似候选生成算法能够执行合取和析取检索。实验表明,我们的算法比竞争基准要快许多倍,并且这种提高的性能不需要牺牲端到端的有效性。我们的结果从经验上描述了此特定搜索体系结构的权衡空间,该权衡空间由输出质量,查询评估速度和内存占用量定义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号