首页> 外文会议>International Conference on Database and Expert Systems Applications >Analyzing Document Retrievability in Patent Retrieval Settings
【24h】

Analyzing Document Retrievability in Patent Retrieval Settings

机译:分析专利检索设置中的文档可检索性

获取原文

摘要

Most information retrieval settings, such as web search, are typically precision-oriented, i.e. they focus on retrieving a small number of highly relevant documents. However, in specific domains, such as patent retrieval or law, recall becomes more relevant than precision: in these cases the goal is to find all relevant documents, requiring algorithms to be tuned more towards recall at the cost of precision. This raises important questions with respect to retrievability and search engine bias: depending on how the similarity between a query and documents is measured, certain documents may be more or less retrievable in certain systems, up to some documents not being retrievable at all within common threshold settings. Biases may be oriented towards popularity of documents (increasing weight of references), towards length of documents, favour the use of rare or common words; rely on structural information such as metadata or headings, etc. Existing accessibility measurement techniques are limited as they measure retrievability with respect to all possible queries. In this paper, we improve accessibility measurement by considering sets of relevant and irrelevant queries for each document. This simulates how recall oriented users create their queries when searching for relevant information. We evaluate retrievability scores using a corpus of patents from US Patent and Trademark Office.
机译:大多数信息检索设置(例如Web Search)通常是精确定向的,即,他们专注于检索少量高度相关的文档。然而,在特定域中,例如专利检索或法律,召回变得比精确更相关:在这些情况下,目标是找到所有相关文件,要求更多地调整算法以精度成本调整。这提出了关于检索性和搜索引擎偏置的重要问题:根据测量查询和文档之间的相似性,某些文档在某些系统中可以或多或少可检索,到某些文档在公共阈值中完全无法检索。设置。偏见可能导致文献的普及(提高的重量增加),朝着文件的长度,赞成使用罕见或常识;依赖于元数据或标题等结构信息等。现有的可访问性测量技术是有限的,因为它们测量了关于所有可能的查询的可回收性。在本文中,我们通过考虑每个文档的相关和无关查询来提高可访问性测量。这模拟了在搜索相关信息时调用面向用户的查询。我们使用来自美国专利和商标局的专利语句来评估可检索性分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号