...
首页> 外文期刊>電子情報通信学会技術研究報告. デ-タ工学. Data Engineering >How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures
【24h】

How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures

机译:多样化的搜索指标有多直观?多样性U度量的一致性测试结果

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

For the past few decades, ranked retrieval (e.g. web search) has been evaluated using rank-based evaluation metrics such as Average Precision and normalised Discounted Cumulative Gain (nDCG). These metrics discount the value of each retrieved relevant document based on its rank. The situation is similar with diversified search which has gained popularity recently: diversity metrics such as α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and D#-nDCG are also rank-based. These widely-used evaluation metrics just regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. The recently-proposed U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with state-of-the-art diversity metrics in terms of how "intuitive" they are: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list by means of the concordance test. Our results show that while D#-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. These results suggest that D-U and U-IA are not only more realistic than rank-based metrics but also intuitive, i.e., that they measure what we want to measure.
机译:在过去的几十年中,已经使用基于排名的评估指标(例如平均精度和归一化的累计累积收益(nDCG))对排名检索(例如网络搜索)进行了评估。这些度量根据其等级对每个检索到的相关文档的价值进行折价。这种情况与最近开始流行的多元化搜索相似:诸如α-nDCG,意图感知的预期交互排名(ERR-IA)和D#-nDCG之类的多样性指标也是基于排名的。这些广泛使用的评估指标仅将系统输出视为文档ID的列表,而忽略了所有其他功能,例如摘要和各种长度的文档全文。 Sakai和Dou最近提出的U-measure框架使用用户阅读的文本量作为打折相关信息价值的基础,并且可以考虑用户的摘要阅读和全文阅读行为。本研究将U-measure(DU和U-IA)的多样性版本与最新的多样性指标进行了比较,以了解它们的“直观性”:给定一对排名列表,我们量化了每个指标都可以通过一致性测试来支持更多样化,更相关的列表。我们的结果表明,尽管D#-nDCG在同时兼顾多样性和相关性方面是整体赢家,但D-U和U-IA在统计上显着优于其他最新指标。此外,就单独的相关性而言,D-U和U-IA明显优于所有基于等级的多样性指标。这些结果表明,D-U和U-IA不仅比基于等级的指标更现实,而且直观,即它们可以衡量我们要衡量的指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号