首页> 外文会议>Joint Conference on Knowledge-based Software Engineering >Using Bibliographic Knowledge for Ranking in Scientific Publication Databases
【24h】

Using Bibliographic Knowledge for Ranking in Scientific Publication Databases

机译:使用书目识别在科学出版数据库中排名

获取原文
获取外文期刊封面目录资料

摘要

Document ranking for scientific publications involves a variety of specialized resources (e.g. author or citation indexes) that are usually difficult to use within standard general purpose search engines that usually operate on large-scale heterogeneous document collections for which the required specialized resources are not always available for all the documents present in the collections. Integrating such resources into specialized information retrieval engines is therefore important to cope with community-specific user expectations that strongly influence the perception of relevance within the considered community. In this perspective, this paper extends the notion of ranking with various methods exploiting different types of bibliographic knowledge that represent a crucial resource for measuring the relevance of scientific publications. In our work, we experimentally evaluated the adequacy of two such ranking methods (one based on freshness, i.e. the publication date, and the other on a novel index, the download-Hirsch index, based on download frequencies) for information retrieval from the CERN scientific publication database in the domain of particle physics. Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments with the specialized ranking method based on the download-Hirsch index to further improve the performance of our aggregative approach.
机译:科学出版物的文件排名涉及各种专业资源(例如作者或引文索引),这通常难以在标准通用搜索引擎中使用,这些搜索引擎通常在大规模异构文件集合上运行所需的专业资源并不总是可用的对于集合中存在的所有文件。因此,将这些资源整合到专门的信息检索发动机是应对特定于社区的用户期望来强烈影响所考虑的社区内的相关性。在这种观点中,本文扩展了利用各种方法的排名概念,利用不同类型的书目知识,该知识代表了测量科学出版物的相关性的关键资源。在我们的工作中,我们通过实验评估了两种这样的排名方法的充分性(基于新鲜度,即新索引,基于新颖索引,基于下载频率的下载-Hirsch指数,用于从CERN的信息检索粒子物理域的科学出版数据库。我们的实验表明,(i)所考虑的专业排名方法确实代表了延长基线排名的有希望的候选人(依靠下载频率),因为它们都导致相当小的搜索结果重叠; (ii)以基于新鲜度的专业排名方法延伸基线排名显着提高了检索质量:16.2%的相对互惠级别的相对增加(RESP。5.1%相对增加的成功@ 10,即,当本地秩和用于聚合时,在检索到的前十十个中的至少一个相关文档的估计概率。我们计划通过基于下载-HIRSCH指数进行专业排名方法的额外实验进一步验证所提出的结果,以进一步提高我们的聚合方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号