首页> 外文期刊>Journal of the American Society for Information Science and Technology >A Method for Eliminating Articles by Homonymous Authors From the Large Number of Articles Retrieved by Author Search
【24h】

A Method for Eliminating Articles by Homonymous Authors From the Large Number of Articles Retrieved by Author Search

机译:从作者搜索检索到的大量文章中删除同名作者的文章的方法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes a methodology which discriminates the articles by the target authors ("true" articles) from those by other homonymous authors ("false" articles). Author name searches for 2,595 "source" authors in six subject fields retrieved about 629,000 articles. In order to extract true articles from the large amount of the retrieved articles, including many false ones, two filtering stages were applied. At the first stage any retrieved article was eliminated as false if either its affiliation addresses had little similarity to those of its source article or there was no citation relationship between the journal of the retrieved article and that of its source article. At the second stage, a sample of retrieved articles was subjected to manual judgment, and utilizing the judgment results, discrimination functions based on logistic regression were defined. These discrimination functions demonstrated both the recall ratio and the precision of about 95% and the accuracy (correct answer ratio) of 90-95%. Existence of common coauthor(s), address similarity, title words similarity, and interjournal citation relationships between the retrieved and source articles were found to be the effective discrimination predictors. Whether or not the source author was from a specific country was also one of the important predictors. Furthermore, it was shown that a retrieved article is almost certainly true if it was cited by, or cocited with, its source article. The method proposed in this study would be effective when dealing with a large number of articles whose subject fields and affiliation addresses vary widely.
机译:本文提出了一种将目标作者的文章(“真实”文章)与其他同名作者的文章(“假”文章)区分开的方法。作者名称在六个主题字段中搜索2,595名“源”作者,共检索了约629,000条文章。为了从大量检索到的文章(包括许多错误的文章)中提取真实的文章,应用了两个过滤阶段。在第一阶段,如果任何检索到的文章的从属地址与其来源文章的相似性不高,或者检索到的文章的期刊与其来源文章的期刊之间没有引文关系,则将其排除为假。在第二阶段,对检索到的物品样本进行人工判断,并利用判断结果定义基于逻辑回归的判别函数。这些判别函数表明召回率和准确率均约为95%,准确率(正确答案率)约为90-95%。发现共同作者的存在,地址相似性,标题词相似性以及检索到的文章和源文章之间的期刊间引用关系是有效的判别指标。来源作者是否来自特定国家也是重要的预测因素之一。此外,研究表明,检索到的文章如果被其来源文章引用或引用,几乎可以肯定是正确的。本研究中提出的方法在处理主题领域和隶属地址差异很大的大量文章时将是有效的。

著录项

  • 来源
  • 作者单位

    Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2, Kasuga, Tsukuba, Ibaraki 305-8550, Japan;

    Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2, Kasuga, Tsukuba, Ibaraki 305-8550, Japan;

    Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2, Kasuga, Tsukuba, Ibaraki 305-8550, Japan;

    Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2, Kasuga, Tsukuba, Ibaraki 305-8550, Japan;

    Bioresource Information Division, RIKEN BioResource Center, 3-1-1, Koyadai, Tsukuba, Ibaraki 305-0074, Japan;

    Toho University Medical Media Center, 5-21-16, Omori-Nishi, Ota-ku, Tokyo 143-8540, Japan;

    Toho University Medical Media Center, 5-21-16, Omori-Nishi, Ota-ku, Tokyo 143-8540, Japan;

    Juntendo University Library, 2-2-26, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan;

    Department of Culture and Language, Shokei University, 6-5-1, Nirenoki, Kumamoto 861-8538, Japan;

    International Medical Information Center, 35, Shinanomachi, Shinjuku-ku, Tokyo 160-0016, Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号