首页> 外文会议>9th ACM/IEEE joint conference on digital libraries 2009 >Using Web Information for Author Name Disambiguation
【24h】

Using Web Information for Author Name Disambiguation

机译:使用Web信息消除作者姓名歧义

获取原文
获取原文并翻译 | 示例

摘要

In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise Fl metric when compared with our best unsupervised baseline method.
机译:在数字图书馆中,由于存在具有相同名称(多义词)或同一作者的不同名称变体(同义词)的多个作者,可能会出现含糊的作者名称。我们在这里提出了一种新方法,该方法使用Web上可用的信息来同时处理这两个问题。我们的想法包括从输入引用中收集信息,并将查询提交给Web搜索引擎,旨在查找简历和包含模棱两可的作者出版物的Web页面。从Web搜索引擎返回的答案集中的文档内容中,提取出有助于歧义消除的有用信息。利用此信息,作者的姓名可以通过利用分层聚类方法来消除歧义,该方法可以将同一文档中的引用以自下而上的方式分组在一起。实验结果表明,我们的方法产生的结果优于两种最新的无监督方法,并且在统计上可与无监督方法进行比较,但无需培训。与我们最好的无监督基线方法相比,我们在成对的Fl度量中观察到高达65.2%的增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号