首页> 外文期刊>IBM Journal of Research and Development >Finding needles in the haystack: Search and candidate generation
【24h】

Finding needles in the haystack: Search and candidate generation

机译:在大海捞针:搜索和候选生成

获取原文
获取原文并翻译 | 示例
           

摘要

A key phase in the DeepQA architecture is Hypothesis Generation, in which candidate system responses are generated for downstream scoring and ranking. In the IBM Watson™ system, these hypotheses are potential answers to Jeopardy!™ questions and are generated by two components: search and candidate generation. The search component retrieves content relevant to a given question from Watson''s knowledge resources. The candidate generation component identifies potential answers to the question from the retrieved content. In this paper, we present strategies developed to use characteristics of Watson''s different knowledge sources and to formulate effective search queries against those sources. We further discuss a suite of candidate generation strategies that use various kinds of metadata, such as document titles or anchor texts in hyperlinked documents. We demonstrate that a combination of these strategies brings the correct answer into the candidate answer pool for 87.17% of all the questions in a blind test set, facilitating high end-to-end question-answering performance.
机译:DeepQA体系结构的关键阶段是假设生成,其中生成候选系统响应以进行下游评分和排名。在IBM Watson™系统中,这些假设是对Jeopardy!™问题的潜在答案,由两个部分生成:搜索和候选生成。搜索组件从Watson的知识资源中检索与给定问题相关的内容。候选生成组件从检索到的内容中识别问题的潜在答案。在本文中,我们提出了利用Watson不同知识资源的特征并针对这些资源制定有效搜索查询的策略。我们进一步讨论了使用各种元数据(例如,超链接文档中的文档标题或锚文本)的一组候选生成策略。我们证明了这些策略的组合将正确答案带入了盲测集中所有问题的87.17%的候选答案池中,从而促进了较高的端到端问题回答性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号