首页> 外文期刊>Expert systems with applications >Disambiguating authors in citations on the web and authorship correlations
【24h】

Disambiguating authors in citations on the web and authorship correlations

机译:在网上引用和作者相关性方面消除作者的歧义

获取原文
获取原文并翻译 | 示例

摘要

Members of the academic community have increasingly turned to digital libraries to search for the latest work of their peers. On account of their role in the academic community, it is very important that these digital libraries collect citations in a consistent, accurate, and up-to-date manner, yet they do not correctly compile citations for myriads of authors for various reasons including authors with the same name, a problem known as the "name ambiguity problem." This problem occurs when multiple authors share the same name and particularly when names are simplified as in cases where names merely contain the first initial and the last name. This paper proposes a reliable and accurate pair-wise similarities approach to disambiguate names using supervised classification on Web correlations and authorship correlations. This approach makes use of Web correlations among citations assuming citations that co-refer on publication lists on the Web should to refer to the same author. This approach also makes use of authorship correlations assuming citations with the same rare author name refer to the same author, and furthermore, citations with the same full names of authors or e-mail addresses likely refer to the same author. These two types of correlations are measured in our approach using pair-wise similarity metrics. In addition, a binary classifier, as part of supervised classification, is applied to label matching pairs of citations using pair-wise similarity metrics, and these labels are then used to group citations into different clusters such that each cluster represents an individual author. Results show our approach greatly improves upon the name disambiguation accuracy and performance of other proposed approaches, especially in some name clusters with high degree of ambiguity.
机译:学术界的成员越来越多地转向数字图书馆来搜索同行的最新作品。考虑到它们在学术界的作用,这些数字图书馆以一致,准确和最新的方式收集引文非常重要,但是由于各种原因(包括作者),它们不能正确地汇编无数作者的引文具有相同名称的问题称为“名称歧义问题”。当多个作者共享相同的名称时,尤其是在名称被简化的情况下(如名称仅包含第一个名字和姓氏的情况),就会出现此问题。本文提出了一种可靠且准确的成对相似度方法,该方法使用Web相关性和作者相​​关性的监督分类来消除歧义名称。这种方法假设引用共同引用在Web上的出版物列表上的引用应引用同一作者,但在引用之间使用Web相关性。这种方法还利用作者身份相关性,假设具有相同罕见作者名称的引用引用的是同一作者,此外,具有相同作者全名或电子邮件地址的引用可能引用的是同一作者。在我们的方法中,使用成对相似性度量来测量这两种类型的相关性。此外,作为监督分类的一部分,将二进制分类器应用于使用成对相似性度量标准对匹配的引文进行标签匹配,然后将这些标签用于将引文分组为不同的簇,以使每个簇代表一个单独的作者。结果表明,我们的方法大大改善了其他提议方法的名称歧义消除准确性和性能,尤其是在某些具有高度歧义性的名称集群中。

著录项

  • 来源
    《Expert systems with applications》 |2012年第12期|p.10521-10532|共12页
  • 作者单位

    Institute of Information Science, Academia Sinica, Taiwan,Department of Computer Science and Information Engineering, National Taiwan University, Taiwan;

    Institute of Information Science, Academia Sinica, Taiwan;

    Institute of Information Science, Academia Sinica, Taiwan,Department of Computer Science and Information Engineering, National Taiwan Normal University, Taiwan;

    Institute of Information Science, Academia Sinica, Taiwan,Department of Computer Science and Information Engineering, National Taiwan University, Taiwan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    author name disambiguation; citation analysis; web correlation; authorship correlation;

    机译:作者名称消除歧义;引文分析;网络相关性作者关系;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号