首页> 外文期刊>ACM SIGIR FORUM >Evaluating Retrieval Performance for Japanese Question Answering: What Are Best Passages?
【24h】

Evaluating Retrieval Performance for Japanese Question Answering: What Are Best Passages?

机译:评估日语检索性能问:什么是最佳通道?

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Question Answering (QA) has recently received attention from the information retrieval, information extraction, machine learning and natural language processing communities. While traditional Information Retrieval (IR) systems return a list of documents, recent QA systems are tackling the problem of returning short, exact answers in response to open-domain, fact-based questions. TREC started the English QA track at TREC-8 (though systems were to return text snippets instead of exact answers up to TREC-10), and NTCIR started the Japanese QA track at NTCIR-3. The popular approach to QA is the combination of passage retrieval and information extraction. Passage retrieval is used for selecting texts that match the terms extracted from the input question, and information extraction is used for extracting candidate answers from the texts. An important question here is how to define a passage: Long passages (e.g. whole documents) may introduce much noise at the answer selection stage, whereas using short passages (e.g. a few sentences) may imply failure to retrieve texts that contain good answers. How a passage should be defined depends primarily on how the search terms extracted from the question are distributed over each document. At NTCIR-3 QAC1 (Question Answering Challenge 1), a collection of Japanese newspaper articles was used as the knowledge source. Many participants treated each paragraph as a passage, as paragraph boundaries were explicitly given in the newspaper CD-ROM data. This paper questions this popular approach by automatically generating a document retrieval test collection from the QAC1 Question Answering test collection and comparing retrieval performances of five different passage types.
机译:问题解答(QA)最近受到了信息检索,信息提取,机器学习和自然语言处理社区的关注。尽管传统的信息检索(IR)系统返回文档列表,但最近的QA系统正在解决针对基于域的基于事实的问题返回简短准确答案的问题。 TREC在TREC-8上启动了英语质量检查轨道(尽管系统将返回文本片段而不是TREC-10之前的确切答案),而NTCIR在NTCIR-3上启动了日语质量检查轨道。 QA的流行方法是段落检索和信息提取的结合。段落检索用于选择与从输入问题中提取的术语相匹配的文本,信息提取用于从文本中提取候选答案。这里的一个重要问题是如何定义段落:长段落(例如整个文档)可能会在答案选择阶段引入很多杂音,而使用短段落(例如几句话)可能意味着无法检索包含良好答案的文本。如何定义段落主要取决于从问题中提取的搜索词在每个文档中的分布方式。在NTCIR-3 QAC1(问题解答挑战1)中,日本报纸的文章集被用作知识来源。由于报纸CD-ROM数据中明确给出了段落边界,因此许多参与者都将每个段落视为一段。本文通过从QAC1问答测试集自动生成文档检索测试集并比较五种不同段落类型的检索性能来质疑这种流行的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号