首页> 外文期刊>ACM Transactions on Information Systems >What Does Affect the Correlation Among Evaluation Measures?
【24h】

What Does Affect the Correlation Among Evaluation Measures?

机译:对评估指标之间的相关性有什么影响?

获取原文
获取原文并翻译 | 示例
           

摘要

Information Retrieval (IR) is well-known for the great number of adopted evaluation measures, with new ones popping up more and more frequently. In this context, correlation analysis is the tool used to study the evaluation measures and to let us understand if two measures rank systems similarly, if they grasp different aspects of system performances or actually reflect different user models, if a new measure is well motivated or not. To this end, the two most commonly used correlation coefficients are the Kendall's r correlation and the AP correlation τap.The goal of the article is to investigate the properties of the tool, that is, correlation analysis, we use to study evaluation measures. In particular, we investigate three research questions about these two correlation coefficients: (ⅰ) what is the effect of the number of systems and topics? (ⅱ) what is the effect of removing low-performing systems? (ⅲ) what is the effect of the experimental collections?To answer these research questions, we propose a methodology based on General Linear Mixed Model (GLMM) and ANalysis Of VAriance (ANOVA) to isolate the effects of the number of topics, number of systems, and experimental collections and to let us observe expected correlation values, net from these effects, which are stable and reliable.We learned that the effect of the number of topics is more prominent than the effect of the number of systems. Even if it produces different absolute values, the effect of removing low-pertorming systems does not seem to provide information substantially different from not removing them, especially when comparing a whole set of evaluation measures. Finally, we found out that both document corpora and topic sets affect the correlation among evaluation measures, the effect of the latter being more prominent. Moreover, there is a substantial interaction between evaluation measures, corpora and topic sets, meaning that the correlation between different evaluation measures can be substantially increased or decreased depending on the different corpora and topics at hand.
机译:信息检索(IR)以大量采用的评估手段而闻名,新的评估手段越来越频繁地出现。在这种情况下,相关性分析是用于研究评估指标的工具,可以让我们了解两个指标是否对系统进行了相似的排名,它们是否掌握了系统性能的不同方面或实际上反映了不同的用户模型,是否有新的动机或不。为此,最常用的两个相关系数是Kendall的r相关性和AP相关性τap。 r n本文的目的是研究工具的属性,即相关性分析,我们将使用它来研究评估措施。特别是,我们针对这两个相关系数研究了三个研究问题:(ⅰ)系统和主题数量的影响是什么? (ⅱ)删除性能不佳的系统有什么作用? (ⅲ)实验集合的影响是什么? r n为回答这些研究问题,我们提出了一种基于通用线性混合模型(GLMM)和变异分析(ANOVA)的方法,以隔离主题数量的影响,系统数量和实验集合,并让我们观察到预期的相关值,这些效果是稳定且可靠的。 r n我们了解到,主题数量的影响比数量影响更突出系统。即使它产生了不同的绝对值,删除低性能系统的效果似乎也不会提供与不删除它们完全不同的信息,尤其是在比较整套评估方法时。最后,我们发现文档语料库和主题集都影响评估措施之间的相关性,后者的效果更加突出。此外,评估措施,语料库和主题集之间存在实质性的交互作用,这意味着可以根据手头上不同的语料库和主题来大幅增加或减少不同评估方法之间的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号