首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Efficient Evaluation of Continuous Text Search Queries
【24h】

Efficient Evaluation of Continuous Text Search Queries

机译:连续文本搜索查询的有效评估

获取原文
获取原文并翻译 | 示例

摘要

Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall.
机译:考虑一个文本过滤服务器,该服务器监视一组用户的传入文档流,这些用户以连续文本搜索查询的形式注册其兴趣。服务器的任务是为每个查询不断维护一个排序结果列表,该列表包括与查询具有最高相似度的最近文档(从滑动窗口绘制)。这种系统是许多文本监视应用程序的基础,这些应用程序需要处理大量的文档流量,例如新闻和电子邮件监视。在本文中,我们提出了第一个有效处理连续文本查询的解决方案。我们的目标是在维持高文档到达率的同时支持大量用户查询。我们的解决方案使用基于倒排文件原理的结构索引主内存中的流式文档,并使用基于增量阈值的方法处理文档的到达和过期事件。我们区分监视算法的两个版本,一个是渴望的,一个是懒惰的,这两种不同之处在于它们对反向索引的阈值进行管理的积极程度不同。通过对一系列真实文档进行基准查询,我们实验性地验证了方法的有效性;它的两个版本都比使用现有技术构建的竞争对手至少快一个数量级,而惰性是总体上最好的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号