首页> 外文会议>ACM SIGMOD international conference on Management of data >Identifying similarities, periodicities and bursts for online search queries
【24h】

Identifying similarities, periodicities and bursts for online search queries

机译:识别在线搜索查询的相似性,周期性和突发性

获取原文

摘要

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using theburst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.
机译:我们为MSN搜索引擎的查询日志提供了几种挖掘知识的方法。使用查询日志,我们为每个查询单词或短语构建一个时间序列(例如,'感恩节'或“圣诞礼物”),时间序列的元素是一天查询发布的次数。我们描述了这种形式的序列的所有方法,并且可以通常应用于时间序列数据。我们的主要目标是发现语义类似的查询,我们通过识别具有类似需求模式的查询来实现。利用最佳傅里叶系数和省略部件的能量,我们改进了时序相似性匹配的最先进。然后在有效的公制树索引结构中组织提取的序列特征。我们还展示了如何有效,准确地发现时间序列中的重要时期。最后,我们提出了一种简单但有效的方法来识别爆发(长期或短期)。使用从序列中提取的urst urst信息,我们能够在时间序列数据库上有效地执行“逐个突发”。我们在使用所描述的方法的工具的描述中结束演示文稿,并用作MSN查询数据库的交互式探索数据发现工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号