首页> 外文会议>International conference on computational linguistics >Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation
【24h】

Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation

机译:缩小片段生成中内在与感知相关性之间的差距

获取原文

摘要

Snippet generation plays an important role in a search engine. Good snippets provide users a good indication on the main content of a search result related to the query and on whether one can find relevant information in it. Previous studies on snippet generation focused on selecting sentences that are related to the query and to the document. However, resulting snippet may look highly relevant while the document itself is not. A missing factor that has not been considered is the consistency between the perceived relevance by the user in reading the snippet and the intrinsic relevance of the document. This factor is important to avoid generating a seemingly relevant snippet for an irrelevant document and vice versa. In this paper, we incorporate this factor in a snippet generation method that imposes the constraint that the snippet of a more relevant document should also be more relevant to the query. We derive a set of pairwise preferences between sentences from relevance judgments. We then use this set to train a gradient boosting decision tree to model a sentence scoring function used in snippet generation. Compared to the existing snippet generation methods and to the snippets generated by a commercial search engine, our snippets are more consistent with the true relevance of the documents. When the snippets are incorporated into a document ranking function, we also observe a significant improvement in retrieval effectiveness. This study shows the importance to generate snippets indicating the right level of relevance to the search results.
机译:代码段的生成在搜索引擎中起着重要的作用。好的摘要可为用户提供有关查询的搜索结果的主要内容以及是否可以在其中找到相关信息的良好指示。先前关于片段生成的研究集中在选择与查询和文档相关的句子。但是,结果片段看起来很相关,而文档本身却不相关。尚未考虑的缺失因素是用户在阅读摘录中感知到的相关性与文档的固有相关性之间的一致性。对于避免为无关文档生成看似相关的代码段,这一点很重要,反之亦然。在本文中,我们将此因素纳入代码段生成方法中,该方法强加了一个约束,即相关性更高的文档的代码段也应与查询更相关。我们从相关性判断中得出句子之间的成对偏好集。然后,我们使用该集合训练梯度提升决策树,以对摘要生成中使用的句子评分功能进行建模。与现有的代码片段生成方法和商业搜索引擎生成的代码片段相比,我们的代码片段与文档的真实相关性更加一致。当片段被合并到文档排名功能中时,我们还观察到检索效率的显着提高。这项研究表明生成摘要的重要性,这些摘要指示与搜索结果相关的正确水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号