首页> 外文会议>IEEE International Conference on Web Services >A Web Service for Author Name Disambiguation in Scholarly Databases
【24h】

A Web Service for Author Name Disambiguation in Scholarly Databases

机译:作者姓名歧义的Web服务在学术数据库中

获取原文

摘要

Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.
机译:作者姓名歧义(和)是从学术或相关数据库中的出版物记录中群集唯一作者名称的任务。虽然已经过广泛研究,并且已经成为几个任务的重要预处理步骤(例如,计算撰写作者的圣经测定学和科学资料学),很少有公开的工具在大型学术数据库中歧义。此外,大多数消除歧消数据嵌入了学术数据库的搜索引擎内,并且现有的应用程序编程接口(API)具有有限的功能,并且由于各种原因,用户通常不可用。这使得研究人员和开发人员难以使用各种应用程序的数据(例如作者搜索)或研究。在这里,我们使用PubMed数据库作为示例应用程序设计一种用于搜索消歧作者的新颖,基于Web的RESTful API。我们提供两种类型的查询,基于属性的查询和基于记录的查询,这些查询提供了不同的目的。基于属性的查询检索作者使用数据库中可用的属性。我们研究不同的搜索引擎,找到最合适的搜索用于处理基于属性的查询。基于录制的查询检索最有可能编写用户提供的查询出版物的作者。为了加速基于记录的查询,我们开发了一种具有快速记录到群集匹配的新型算法。我们表明,与基线天真的方法相比,我们的算法可以将查询加速4.01因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号