【24h】

Query Type Classification for Web Document Retrieval

机译:Web文档检索的查询类型分类

获取原文
获取原文并翻译 | 示例

摘要

The heterogeneous Web exacerbates IR problems and short riser queries make them worse. The contents of web documents are not enough to find good answer documents. Link information and URL information compensates for the insufficiencies of content information. However, static combination of multiple evidences may lower the retrieval performance. We need different strategies to find target documents according to a query type. We can classify user queries as three categories, the topic relevance task, the homepage finding task, and the service finding task. In this paper, a user query classification scheme is proposed. This scheme uses the difference of distribution, mutual information, the usage rate as anchor texts, and the POS information for the classification. After we classified a user query, we apply different algorithms and information for the better results. For the topic relevance task, we emphasize the content information, on the other hand, for the homepage finding task, we emphasize the Link information and the URL information. We could get the best performance when our proposed classification method with the OKAPI scoring algorithm was used.
机译:异构Web加剧了IR问题,短管上升查询使它们变得更糟。 Web文档的内容不足以找到良好的答案文档。链接信息和URL信息弥补了内容信息的不足。但是,多个证据的静态组合可能会降低检索性能。我们需要不同的策略来根据查询类型查找目标文档。我们可以将用户查询分为三类:主题相关性任务,主页查找任务和服务查找任务。本文提出了一种用户查询分类方案。该方案使用分布的差异,相互信息,使用率作为锚文本以及POS信息进行分类。在对用户查询进行分类之后,我们将应用不同的算法和信息以获得更好的结果。对于主题相关性任务,我们强调内容信息,而对于首页查找任务,我们强调链接信息和URL信息。当使用我们提出的带有OKAPI评分算法的分类方法时,我们可以获得最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号