首页> 外文会议>Natural Language Processing and Information Systems >Ranked-Listed or Categorized Results in IR: 2 Is Better Than 1
【24h】

Ranked-Listed or Categorized Results in IR: 2 Is Better Than 1

机译:IR中的排名列表或分类结果:2优于1

获取原文
获取原文并翻译 | 示例

摘要

In this paper we examine the performance of both ranked-listed and categorized results in the context of known-item search (target testing). Performance of known-item search is easy to quantify based on the number of examined documents and class descriptions. Results are reported on a subset of the Open Directory classification hierarchy, which enable us to control the error rate and investigate how performance degrades with error. Three types of simulated user model are identified together with the two operating scenarios of correct and incorrect classification. Extensive empirical testing reveals that in the ideal scenario, i.e. perfect classification by both human and machine, a category-based system significantly outperforms a ranked list for all but the best queries, i.e. queries for which the target document was initially retrieved in the top-5. When either human or machine error occurs, and the user performs a search strategy that is exclusively category based, then performance is much worse than for a ranked list. However, most interestingly, if the user follows a hybrid strategy of first looking in the expected category and then reverting to a ranked list if the target is absent, then performance can remain significantly better than for a ranked list, even with misclassi-fication rates as high as 30%. We also observe that this hybrid strategy results in performance degradations that degrade gracefully with error rate.
机译:在本文中,我们在已知项目搜索(目标测试)的背景下检查了排名结果和分类结果的性能。基于已检查文档的数量和类别描述,易于量化已知项目搜索的性能。结果报告在Open Directory分类层次结构的子集上,这使我们能够控制错误率并研究性能如何因错误而降低。确定了三种类型的模拟用户模型以及正确和错误分类的两种操作方案。广泛的经验测试表明,在理想的情况下,即人和机器都进行了完美的分类,基于类别的系统对除最佳查询(即最初从顶部检索到目标文档的查询)以外的所有查询的排序列表均明显优于排名列表5,如果发生人为错误或机器错误,并且用户执行的搜索策略完全基于类别,则性能要比排名列表差很多。但是,最有趣的是,如果用户遵循一种混合策略,首先查看预期的类别,然后在缺少目标的情况下返回到排名列表,则即使分类错误率高,性能也可以比排名列表保持明显更好高达30%我们还观察到,这种混合策略会导致性能下降,并随错误率而下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号