...
首页> 外文期刊>Scientific programming >Name Disambiguation Based on Graph Convolutional Network
【24h】

Name Disambiguation Based on Graph Convolutional Network

机译:基于图形卷积网络的名称消歧

获取原文

摘要

Recently, massive online academic resources have provided convenience for scientific study and research. However, the author name ambiguity degrades the user experience in retrieving the literature bases. Extracting the features of papers and calculating the similarity for clustering constitute the mainstream of present name disambiguation approaches, which can be divided into two branches: clustering based on attribute features and clustering based on linkage information. They cannot however get high performance. In order to improve the efficiency of literature retrieval and provide technical support for the accurate construction of literature bases, a name disambiguation method based on Graph Convolutional Network (GCN) is proposed. The disambiguation model based on GCN designed in this paper combines both attribute features and linkage information. We first build paper-to-paper graphs, coauthor graphs, and paper-to-author graphs for each reference item of a name. The nodes in the graphs contain attribute features and the edges contain linkage features. The graphs are then fed to a specialized GCN and output a hybrid representation. Finally, we use the hierarchical clustering algorithm to divide the papers into disjoint clusters. Finally, we cluster the papers using a hierarchical algorithm. The experimental results show that the proposed model achieves average F1 value of 77.10% on three name disambiguation datasets. In order to let the model automatically select the appropriate number of convolution layers and adapt to the structure of different local graphs, we improve upon the prior GCN model by utilizing attention mechanism. Compared with the original GCN model, it increases the average precision and F1 value by 2.05% and 0.63%, respectively. What is more, we build a bilingual dataset, BAT, which contains various forms of academic achievements and will be an alternative in future research of name disambiguation.
机译:近日,海量网络学术资源提供了便利的科学学习和研究。然而,作者的名字含糊降低在检索文献基础用户体验。提取的文件的特性和计算用于聚类的相似性构成本名称消歧的主流方法,其可被分为两个分支:群集基于属性特征和聚类基于链接信息。他们不能拿到然而高性能。为了提高文献检索的效率和文学基地精确的施工提供技术支持,基于图形卷积网络(GCN)的名称消歧方法提出。基于GCN消歧模型本文设计了结合了功能属性和连接信息。我们首先构建纸与纸图,图表合着者,以及纸张到笔者图形的名称的每个参考项目。图表中的节点包含属性的特征和边缘包含联动特征。然后这些图表被馈送到一个专门GCN和输出的混合表示。最后,我们使用分层聚类算法的论文分为不相交的簇。最后,我们使用聚类分层算法的论文。实验结果表明,该模型在三个名称消歧数据集,实现了77.10%的平均F1值。为了自动让模型选择卷积层的适当数量和适应不同的局部图的结构,我们在提高利用注意机制事先GCN模型。与原GCN模型相比,它由分别2.05%和0.63%,增加了平均精度和F1值。更重要的是,我们建立了一个双语的数据集,BAT,有各种形式的学术成果,将在消除名称歧义的未来研究的一个替代品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号