首页> 外文会议>19th international world wide web conference 2010 >Exploring Web Scale Language Models for Search Query Processing
【24h】

Exploring Web Scale Language Models for Search Query Processing

机译:探索用于搜索查询处理的Web缩放语言模型

获取原文

摘要

It has been widely observed that search queries are composed in a very different style from that of the body or the title of a document. Many techniques explicitly accounting for this language style discrepancy have shown promising results for information retrieval, yet a large scale analysis on the extent of the language differences has been lacking. In this paper, we present an extensive study on this issue by examining the language model properties of search queries and the three text streams associated with each web document: the body, the title, and the anchor text. Our information theoretical analysis shows that queries seem to be composed in a way most similar to how authors summarize documents in anchor texts or titles, offering a quantitative explanation to the observations in past work.We apply these web scale n-gram language models to three search query processing (SQP) tasks: query spelling correction, query bracketing and long query segmentation. By controlling the size and the order of different language models, we find that the perplexity metric to be a good accuracy indicator for these query processing tasks. We show that using smoothed language models yields significant accuracy gains for query bracketing for instance, compared to using web counts as in the literature. We also demonstrate that applying web-scale language models can have marked accuracy advantage over smaller ones.
机译:广泛观察到,搜索查询的构成风格与文档的正文或标题完全不同。许多明确解决这种语言风格差异的技术已显示出可取的信息检索结果,但仍缺乏对语言差异程度的大规模分析。在本文中,我们通过检查搜索查询的语言模型属性以及与每个Web文档关联的三个文本流(正文,标题和锚文本),对这个问题进行了广泛的研究。我们的信息理论分析表明,查询的构成方式似乎与作者汇总锚文本或标题中的文档的方式最为相似,从而对过去工作中的观察结果进行了定量解释。 我们将这些网络规模的n-gram语言模型应用于三个搜索查询处理(SQP)任务:查询拼写校正,查询括弧和长查询分段。通过控制不同语言模型的大小和顺序,我们发现困惑度指标是这些查询处理任务的良好准确性指标。我们证明,与文献中使用网络计数相比,使用平滑语言模型可为查询括弧例如产生显着的准确性提高。我们还证明,与较小的模型相比,应用Web规模的语言模型可以具有明显的准确性优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号