首页> 外文OA文献 >A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search
【2h】

A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search

机译:一种从作者搜索检索到的大量文章中删除同名作者的文章的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper proposes a methodology which discriminates the articles by the target authors (“true” articles) from those by other homonymous authors (“false” articles). Author name searches for 2,595 “source” authors in six subject fields retrieved about 629,000 articles. In order to extract true articles from the large amount of the retrieved articles, including many false ones, two filtering stages were applied. At the first stage any retrieved article was eliminated as false if either its affiliation addresses had little similarity to those of its source article or there was no citation relationship between the journal of the retrieved article and that of its source article. At the second stage, a sample of retrieved articles was subjected to manual judgment, and utilizing the judgment results, discrimination functions based on logistic regression were defined. These discrimination functions demonstrated both the recall ratio and the precision of about 95% and the accuracy (correct answer ratio) of 90–95%. Existence of common coauthor(s), address similarity, title words similarity, and interjournal citation relationships between the retrieved and source articles were found to be the effective discrimination predictors. Whether or not the source author was from a specific country was also one of the important predictors. Furthermore, it was shown that a retrieved article is almost certainly true if it was cited by, or cocited with, its source article. The method proposed in this study would be effective when dealing with a large number of articles whose subject fields and affiliation addresses vary widely.
机译:本文提出了一种将目标作者的文章(“真实”文章)与其他同名作者的文章(“假”文章)区分开的方法。作者名称在六个主题字段中搜索了2,595名“源”作者,共检索了约629,000条文章。为了从大量检索到的文章(包括许多错误的文章)中提取真实的文章,应用了两个过滤阶段。在第一阶段,如果任何检索到的文章的从属地址与其来源文章的相似性不高,或者检索到的文章的期刊与其来源文章的期刊之间没有引用关系,则将其排除为错误。在第二阶段,对检索到的物品样本进行人工判断,并利用判断结果定义基于逻辑回归的判别函数。这些判别函数表明召回率和准确率均约为95%,准确率(正确答案率)约为90-95%。发现共同作者的存在,地址相似性,标题词相似性以及检索到的文章和源文章之间的期刊间引用关系是有效的判别指标。来源作者是否来自特定国家也是重要的预测因素之一。此外,研究表明,检索到的文章如果被其来源文章引用或引用,几乎可以肯定是正确的。本研究中提出的方法在处理主题领域和隶属地址差异很大的大量文章时将是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号