首页> 外文会议>International conference on theory and practice of digital libraries >Segmenting User Sessions in Search Engine Query Logs Leveraging Word Embeddings
【24h】

Segmenting User Sessions in Search Engine Query Logs Leveraging Word Embeddings

机译:在搜索引擎查询日志中分段用户会话利用Word Embeddings

获取原文

摘要

Segmenting user sessions in search engine query logs is important to perceive information needs and assess how they are satisfied, to enhance the quality of search engine rankings, and to better direct content to certain users. Most previous methods use human judgments to inform supervised learning algorithms, and/or use global thresholds on temporal proximity and on simple lexical similarity metrics. This paper proposes a novel unsupervised method that improves the current state-of-art, leveraging additional heuristics and similarity metrics derived from word embeddings. We specifically extend a previous approach based on combining temporal and lexical similarity measurements, integrating semantic similarity components that use pre-trained FastText embeddings. The paper reports on experiments with an AOL query dataset used in previous studies, containing a total of 10,235 queries, with 4,253 sessions, 2.4 queries per session, and 215 unique users. The results attest to the effectiveness of the proposed method, which outperforms a large set of baselines, also corresponding to unsupervised techniques.
机译:搜索引擎查询日志中分段用户会话对于感知信息需求并评估它们的满意度,以提高搜索引擎排名的质量,并更好地直接内容对某些用户进行更好的方式。最先前的方法使用人类判断来通知受监督的学习算法,和/或在时间接近和简单的词汇相似度量上使用全局阈值。本文提出了一种新颖的无监督方法,可提高目前的最先进,利用来自Word Embeddings的额外启发式和相似度指标。我们特别基于组合时间和词法相似度测量来扩展先前的方法,集成了使用预先训练的FastText Embeddings的语义相似性分量。本文报告了先前研究中使用的AOL查询数据集的实验,总共包含10,235个查询,每次会话为4,253个会话,2.4个查询,215个唯一用户。结果证明了所提出的方法的有效性,这始于大量的基线,也对应于无监督的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号