首页> 外文期刊>ACM transactions on Asian language information processing >On the Reliability of Factoid Question Answering Evaluation
【24h】

On the Reliability of Factoid Question Answering Evaluation

机译:Factoid问题回答评估的可靠性

获取原文
获取原文并翻译 | 示例
           

摘要

This paper compares some existing evaluation metrics for factoid question answering (QA) from the viewpoint of stability and sensitivity, using the NTCIR-4 QAC2 Japanese factoid QA tasks and the Buckley/Voorhees stability method and Voorhees/Buckley swap method. Our main findings are: (1) For QA evaluation with ranked lists containing up to five answers, the fraction of questions with a correct answer within top 5 (NQcorrect5) and that with a correct answer at rank 1 (NQcorrectl) are not as stable and sensitive as reciprocal rank. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is at least as stable and sensitive as reciprocal rank, provided that a mild gain value assignment is used. Emphasizing answer correctness levels tends to hurt stability and sensitivity, while handling multiple correct answers improves them. As our experimental methods are language-independent, we believe that these findings apply to QA in languages other than Japanese as well.
机译:本文从稳定性和敏感性的角度,使用NTCIR-4 QAC2日本的类事实QA任务,Buckley / Voorhees稳定性方法和Voorhees / Buckley互换方法,从稳定性和敏感性的角度比较了一些现有的对类事实问答的评估指标。我们的主要发现是:(1)对于具有最多五个答案的排名列表的QA评估,正确答案在前5名(NQcorrect5)和正确答案在第1名(NQcorrectl)的问题分数不稳定和敏感的互惠等级。 (2)可以处理多个正确答案和答案正确性级别的Q量度至少与倒数排名一样稳定和敏感,只要使用了温和的增益值分配即可。强调答案正确性水平往往会损害稳定性和敏感性,而处理多个正确答案则会提高稳定性和敏感性。由于我们的实验方法与语言无关,因此我们相信这些发现也适用于日语以外的其他语言的质量检查。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号