Dynamic author name disambiguation for growing digital libraries

Qian Yanan; Zheng Qinghua; Sakai Tetsuya; Ye Junting; Liu Jun

首页> 外文期刊>Information retrieval >Dynamic author name disambiguation for growing digital libraries

【24h】

Dynamic author name disambiguation for growing digital libraries

机译：动态的作者姓名消除了不断增长的数字图书馆的歧义

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

When a digital library user searches for publications by an author name, she often sees a mixture of publications by different authors who have the same name. With the growth of digital libraries and involvement of more authors, this author ambiguity problem is becoming critical. Author disambiguation (AD) often tries to solve this problem by leveraging metadata such as coauthors, research topics, publication venues and citation information, since more personal information such as the contact details is often restricted or missing. In this paper, we study the problem of how to efficiently disambiguate author names given an incessant stream of published papers. To this end, we propose a "BatchAD+IncAD" framework for dynamic author disambiguation. First, we perform batch author disambiguation (BatchAD) to disambiguate all author names at a given time by grouping all records (each record refers to a paper with one of its author names) into disjoint clusters. This establishes a one-to-one mapping between the clusters and real-world authors. Then, for newly added papers, we periodically perform incremental author disambiguation (IncAD), which determines whether each new record can be assigned to an existing cluster, or to a new cluster not yet included in the previous data. Based on the new data, IncAD also tries to correct previous AD results. Our main contributions are: (1) We demonstrate with real data that a small number of new papers often have overlapping author names with a large portion of existing papers, so it is challenging for IncAD to effectively leverage previous AD results. (2) We propose a novel IncAD model which aggregates metadata from a cluster of records to estimate the author's profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is "produced" by the author. (3) Using two labeled datasets and one large-scale raw dataset, we show that the proposed method is much more efficient than state-of-the-art methods while ensuring high accuracy.

机译：当数字图书馆用户按作者姓名搜索出版物时，她经常会看到名称不同的不同作者的出版物混合在一起。随着数字图书馆的增长和更多作者的参与，这个作者歧义性问题变得越来越重要。作者歧义消除（AD）通常试图通过利用诸如合著者，研究主题，出版地点和引用信息之类的元数据来解决此问题，因为通常会限制或丢失诸如联系方式之类的更多个人信息。在本文中，我们研究了如何在不断发表论文的情况下有效地消除作者姓名的歧义。为此，我们为动态作者消除歧义提出了一个“ BatchAD + IncAD”框架。首先，我们执行批处理作者歧义消除（BatchAD），以通过将所有记录（每条记录引用具有其作者名之一的论文）分组为不相交的簇来消除给定时间的所有作者名。这将在群集和实际作者之间建立一对一的映射。然后，对于新添加的论文，我们会定期执行增量作者歧义消除（IncAD），以确定每个新记录是可以分配给现有群集，还是可以分配给先前数据中尚未包括的新群集。基于新数据，IncAD还尝试更正以前的AD结果。我们的主要贡献是：（1）我们用真实数据证明，少数新论文的作者姓名经常与大部分现有论文重叠，因此对于IncAD来说，有效利用以前的AD结果具有挑战性。（2）我们提出了一个新颖的IncAD模型，该模型聚集记录集群中的元数据以估计作者的个人资料（例如她的共同作者分布和关键字分布），以便预测作者“产生”新记录的可能性。（3）使用两个标记的数据集和一个大规模的原始数据集，我们证明了该方法在确保高精度的同时，比最先进的方法有效得多。

著录项

来源
《Information retrieval》 |2015年第5期|379-412|共34页
作者
Qian Yanan; Zheng Qinghua; Sakai Tetsuya; Ye Junting; Liu Jun;
展开▼
作者单位

Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China;

Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China;

Waseda Univ, Dept Comp Sci & Engn, Tokyo, Japan;

Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China;

Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Digital library; Author disambiguation; Data stream; Clustering; Multi-classification;

机译：数字图书馆;作者消除歧义;数据流;聚类;多分类;

相似文献

外文文献
中文文献
专利

1. Presenting results as dynamically generated co-authorship subgraphs in semantic digital library collections. [J] . Powell James1McMahon Tamara M.1ne Ketan1Miller Laniece1Collins Linn1 Code4Lib Journal . 2011,第16期

机译：在语义数字图书馆集合中将结果显示为动态生成的共同作者子图。
2. A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries [J] . D-lib magazine . 2013,第19期

机译：基于实时启发式的无监督数字图书馆名称歧义消除方法
3. A Unified Probabilistic Framework for Name Disambiguation in Digital Library [J] . Jie Tang Knowledge and Data Engineering, IEEE Transactions on . 2012,第6期

机译：数字图书馆名称歧义化的统一概率框架
4. Large scale author name disambiguation in digital libraries [C] . Khabsa Madian, Treeratpituk Pucktada, Giles C. Lee IEEE International Congress on Big Data . 2014

机译：数字图书馆中大型作者姓名的消除歧义
5. Parallel inverted index for large-scale, dynamic digital libraries. [D] . Sornil, Ohm. 2001

机译：大型动态数字图书馆的并行倒排索引。
6. PEIR Digital Library: Online Resources and Authoring System [O] . Kristopher N. Jones, Dwain E. Woode, Kristina Panizzi, 2001

机译：PEIR数字图书馆：在线资源和创作系统
7. Large Scale Author Name Disambiguation in Digital Libraries [O] . Madian Khabsa, Pucktada Treeratpituk, C. Lee Giles 2015

机译：数字图书馆中的大规模作者姓名消歧
8. Subthreshold Digital Library Using a Dynamic-Threshold Metal-Oxide Semiconductor (DTMOS) and Transmission Gate Logic. [R] . Lee, T. C., Proie, R. M. 2014

机译：使用动态阈值金属氧化物半导体（DTmOs）和传输门逻辑的亚阈值数字库。

Dynamic author name disambiguation for growing digital libraries

摘要

著录项

相似文献

相关主题

期刊订阅