首页> 外文会议>IEEE International Conference on Web Services >A Web Service for Author Name Disambiguation in Scholarly Databases
【24h】

A Web Service for Author Name Disambiguation in Scholarly Databases

机译:用于学术数据库中作者姓名歧义消除的Web服务

获取原文

摘要

Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.
机译:作者姓名歧义消除(AND)是将学术或相关数据库中出版物记录中的唯一作者姓名聚类的任务。尽管对AND进行了广泛的研究,并已作为完成多个任务的重要预处理步骤(例如,为作者计算书目计量和科学计量),但大型学术数据库中很少有公开可用的歧义消除工具。此外,大多数消除歧义的数据都嵌入在学术数据库的搜索引擎中,并且现有的应用程序编程接口(API)具有有限的功能,并且由于各种原因,用户通常无法使用它们。这使得研究人员和开发人员难以将数据用于各种应用程序(例如作者搜索)或研究。在这里,我们使用PubMed数据库作为示例应用程序,设计了一种新颖的基于Web的RESTful API,用于搜索歧义作者。我们提供两种类型的查询,基于属性的查询和基于记录的查询,它们具有不同的用途。基于属性的查询检索具有数据库中可用属性的作者。我们研究了不同的搜索引擎,以找到最合适的搜索引擎来处理基于属性的查询。基于记录的查询检索最有可能编写了用户提供的查询出版物的作者。为了加速基于记录的查询,我们开发了一种新颖的算法,该算法具有快速的记录到簇匹配。我们证明,与基线朴素方法相比,我们的算法可以将查询速度提高4.01倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号