首页> 外文会议>ACM/IEEE-CS joint conference on digital libraries >Active Associative Sampling for Author Name Disambiguation
【24h】

Active Associative Sampling for Author Name Disambiguation

机译:作者名称消歧的积极联想采样

获取原文

摘要

One of the hardest problems faced by current scholarly digital libraries is author name ambiguity. This problem occurs when, in a set of citation records, there are records of a same author under distinct names, or citation records belonging to distinct authors with similar names. Among the several proposed methods, the most effective ones seem to be based on the direct assignment of the records to their respective authors by means of the application of supervised machine learning techniques. The effectiveness of such methods is usually directly correlated with the amount of supervised training data available. However, the acquisition of training examples requires skilled human annotators to manually label references. Aiming to reduce the set of examples needed to produce the training data, in this paper we propose a new active sampling strategy based on association rules for the author name disambiguation task. We compare our strategy with state-of-the-art supervised baselines that use the complete labeled training dataset and other active methods and show that very competitive results in terms of disambiguation effectiveness can be obtained with reductions in the training set of up to 71 %.
机译:当前学术数字图书馆面临的最困难的问题之一是作者名称歧义。在一组引文记录中,出现此问题时,在不同的名称下有一个同一作者的记录,或属于具有类似名称的不同作家的引文记录。在若干提出的方法中,最有效的方法似乎通过应用监督机器学习技术的应用基于对其各自作者的直接分配。此类方法的有效性通常与可用的监督培训数据的数量直接相关。然而,收购培训例子需要熟练的人类注释者手动标记参考。旨在减少生产培训数据所需的示例,本文提出了一种基于作者名称歧义任务的关联规则的新的主动采样策略。我们将我们的策略与最先进的监督基准,使用完整的标签培训数据集和其他积极方法,并表明在歧义效果方面可以获得非常竞争力的结果,培训集高达71% 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号