On the Reliability of Factoid Question Answering Evaluation

TETSUYA SAKAI

首页> 外文期刊>ACM transactions on Asian language information processing >On the Reliability of Factoid Question Answering Evaluation

【24h】

On the Reliability of Factoid Question Answering Evaluation

机译：Factoid问题回答评估的可靠性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper compares some existing evaluation metrics for factoid question answering (QA) from the viewpoint of stability and sensitivity, using the NTCIR-4 QAC2 Japanese factoid QA tasks and the Buckley/Voorhees stability method and Voorhees/Buckley swap method. Our main findings are: (1) For QA evaluation with ranked lists containing up to five answers, the fraction of questions with a correct answer within top 5 (NQcorrect5) and that with a correct answer at rank 1 (NQcorrectl) are not as stable and sensitive as reciprocal rank. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is at least as stable and sensitive as reciprocal rank, provided that a mild gain value assignment is used. Emphasizing answer correctness levels tends to hurt stability and sensitivity, while handling multiple correct answers improves them. As our experimental methods are language-independent, we believe that these findings apply to QA in languages other than Japanese as well.

机译：本文从稳定性和敏感性的角度，使用NTCIR-4 QAC2日本的类事实QA任务，Buckley / Voorhees稳定性方法和Voorhees / Buckley互换方法，从稳定性和敏感性的角度比较了一些现有的对类事实问答的评估指标。我们的主要发现是：（1）对于具有最多五个答案的排名列表的QA评估，正确答案在前5名（NQcorrect5）和正确答案在第1名（NQcorrectl）的问题分数不稳定和敏感的互惠等级。（2）可以处理多个正确答案和答案正确性级别的Q量度至少与倒数排名一样稳定和敏感，只要使用了温和的增益值分配即可。强调答案正确性水平往往会损害稳定性和敏感性，而处理多个正确答案则会提高稳定性和敏感性。由于我们的实验方法与语言无关，因此我们相信这些发现也适用于日语以外的其他语言的质量检查。

著录项

来源
《ACM transactions on Asian language information processing》 |2007年第1期|p.3.1-3.23|共23页
作者
TETSUYA SAKAI;
展开▼
作者单位

Knowledge Media Laboratory, Toshiba Corporate R&D Center, 1 Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, JAPAN;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
question answering; evaluation metrics;

机译：问题回答;评估指标;

相似文献

外文文献
中文文献
专利

1. Using IS-A Relation Patterns for Factoid Questions in Question Answering Systems [J] . BOJUN SHIM, YOUNGJOONG KO, JUNGYUN SEO IEICE Transactions on Information and Systems . 2006,第12期

机译：在问答系统中对Factoid问题使用IS-A关系模式
2. Non-Factoid Answer Selection in Indonesian Science Question Answering System using Long Short-Term Memory (LSTM) [J] . Alfi Fauzia Hanifah, Retno Kusumaningrum Procedia Computer Science . 2021,第1期

机译：非因子答案选择在印度尼西亚科学问题应答系统中使用长短期内存（LSTM）
3. Arabic factoid Question-Answering system for Islamic sciences using normalized corpora [J] . Hajer Maraoui, Kais Haddar, Laurent Romary Procedia Computer Science . 2021,第a期

机译：使用标准化语料库的伊斯兰科学的阿拉伯因果问答系统
4. Building TALAA-AFAQ, a Corpus of Arabic FActoid Question-Answers for a Question Answering System [C] . Asma Aouichat, Ahmed Guessoum International conference on applications of natural language to information systems . 2017

机译：建立TALAA-AFAQ，一个阿拉伯FActoid问题答案的语料库
5. Domain Adaptation for Factoid Question Answering. [D] . Yoshida, Davis. 2017

机译：Factoid问题解答的域适应。
6. Factoid Question Answering with Distant Supervision [O] . Hongzhi Zhang, Xiao Liang, Guangluan Xu, 2018

机译：对遥远的监督回答的因果问题
7. Study and Implementation of Monolingual Approach on Indonesian Question Answering for Factoid and Non-Factoid Question [O] . Zulen Alvin Andhika, Purwarianti Ayu 2011

机译：Factoid和非Factoid问题印尼语回答的单语方法的研究与实现
8. Answering Questions, Questioning Answers: Evaluating Data Quality in an Establishment Survey [R] . Goldenberg, K. L. 2008

机译：回答问题，质疑答案：评估企业调查中的数据质量

On the Reliability of Factoid Question Answering Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅