首页> 外文会议>Pacific-Asia Conference on Knowledge Discovery and Data Mining >Strong Baselines for Author Name Disambiguation with and Without Neural Networks
【24h】

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

机译:有和没有神经网络的作者姓名歧义消除的强大基准

获取原文

摘要

Author name disambiguation (AND) is one of the most vital problems in scientometrics, which has become a great challenge with the rapid growth of academic digital libraries. Existing approaches for this task substantially rely on complex clustering-like architectures, and they usually assume the number of clusters is known beforehand or predict the number by applying another model, which involve increasingly complex and time-consuming architectures. In this paper, we combine simple neural networks with two sets of heuristic rules to explore strong baselines for the author name disambiguation problem without any priori knowledge or estimation about cluster size, which frees the model from unnecessary complexity. On a popular benchmark dataset AMiner, our solution significantly outperforms several state-of-the-art methods both in performance and efficiency, and it still achieves comparable performance with many complex models when only using a group of rules. Experimental results also indicate that gains from sophisticated deep learning techniques are quite modest in the author name disambiguation problem.
机译:作者名称歧义消除(AND)是科学计量学中最重要的问题之一,随着学术数字图书馆的快速发展,这已成为一个巨大的挑战。用于该任务的现有方法基本上依赖于复杂的类似于群集的体系结构,并且它们通常假定群集的数目是事先已知的,或者通过应用另一种模型来预测群集的数目,该模型涉及越来越复杂和耗时的体系结构。在本文中,我们将简单的神经网络与两组启发式规则相结合,以探索作者名称歧义消除问题的强大基线,而无需任何先验知识或簇大小估计,这使模型摆脱了不必要的复杂性。在流行的基准数据集AMiner上,我们的解决方案在性能和效率上均大大优于几种最新方法,并且仅使用一组规则,它仍可以与许多复杂模型实现可比的性能。实验结果还表明,在作者姓名消除歧义问题中,复杂的深度学习技术所带来的收益是很小的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号