首页> 外文会议>Asia information retrieval societies conference >Revisiting the Evaluation of Diversified Search Evaluation Metrics with User Preferences
【24h】

Revisiting the Evaluation of Diversified Search Evaluation Metrics with User Preferences

机译:通过用户偏好重新评估多元化的搜索评估指标

获取原文

摘要

To validate the credibility of diversity evaluation metrics, a number of methods that "evaluate evaluation metrics" are adopted in diversified search evaluation studies, such as Kendall's t, Discriminative Power, and the Intuitiveness Test. These methods have been widely adopted and have aided us in gaining much insight into the effectiveness of evaluation metrics. However, they also follow certain types of user behaviors or statistical assumptions and do not take the information of users' actual search preferences into consideration. With multi-grade user preference judgments collected for diversified search result lists displayed parallel, we take user preferences as the ground truth to investigate the evaluation of diversity metrics. We find that user preference at the subtopic level gain similar results with those at the topic level, which means we can use user preference at the topic level with much less human efforts in future experiments. We further find that most existing evaluation metrics correlate with user preferences well for result lists with large performance differences, no matter the differences is detected by the metric or the users. According to these findings, we then propose a preference-weighted correlation, the Multi-grade User Preference (MUP) method, to evaluate the diversity metrics based on user preferences. The experimental results reveal that MUP evaluates diversity metrics from real users' perspective that may differ from other methods. In addition, we find the relevance of the search result is more important than the diversity of the search result in the diversified search evaluation of our experiments.
机译:为了验证多样性评估指标的可信度,在多元化搜索评估研究中采用了许多“评估评估指标”的方法,例如Kendall的t,判别力和直觉性测试。这些方法已被广泛采用,并有助于我们深入了解评估指标的有效性。但是,它们也遵循某些类型的用户行为或统计假设,并且没有考虑用户实际搜索偏好的信息。通过并行显示针对多种搜索结果列表收集的多级用户偏好判断,我们将用户偏好作为基础事实来研究多样性指标的评估。我们发现,子主题级别的用户偏好与主题级别的用户偏好获得了相似的结果,这意味着我们可以在主题级别使用用户偏好,而在未来的实验中所花费的精力更少。我们还发现,对于性能差异较大的结果列表,无论该度量标准还是用户检测到差异,大多数现有评估指标都与用户偏好相关性很好。根据这些发现,我们然后提出偏好加权相关性,即多级用户偏好(MUP)方法,以基于用户偏好来评估多样性指标。实验结果表明,MUP从实际用户的角度评估多样性指标,这可能与其他方法有所不同。另外,在我们的实验的多元化搜索评估中,我们发现搜索结果的相关性比搜索结果的多样性更重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号