首页> 外文期刊>Library Review >Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles
【24h】

Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

机译:基于主题的科学文献检索,案例研究:信息技术科学文章检索

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose: The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields. Design/methodology/approach: The authors propose to apply a statistical classification-based approach for extracting IT-related articles. In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques. Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme. Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities. Findings: The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model. They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas. The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT. Research limitations/implications: Although this research is limited to the IT subject, it can be generalized for any subject as well. However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase. In this research, bigram model is used; however, one can extend it to tri-gram as well. Originality/value: This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents. The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model. The model, based on a set of keyphrases, extracted from a collection of IT articles. However, the extraction technique does not rely on Term Frequency-Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases. In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.
机译:目的:本文的目的是介绍一种从诸如Web of Science(WoS)之类的科学数据库中检索信息技术(IT)领域中的一组科学文章的方法,以应用科学计量学指标并将其与其他指标进行比较领域。设计/方法/方法:作者建议采用基于统计分类的方法来提取与IT相关的文章。在这种方法中,首先,使用密钥短语提取技术引入概率模型来对主题IT进行建模。然后,他们根据贝叶斯分类方案从WoS中所有伊朗论文中检索与IT相关的文章。他们基于概率IT模型,为数据库中的每篇文章分配IT成员资格概率,然后检索具有最高概率的文章。结果:作者通过概率短语抽取过程提取了一组IT密钥短语,其中包含1,497个术语。他们用两种方法评估了建议的检索方法:基于查询的方法,其中使用一组由有限的IT关键字组成的查询从WoS中检索文章;以及基于研究区域的方法,该方法基于使用以下方法检索文章WoS的分类和研究领域。评估和比较结果表明,该方法能够生成更准确的结果,同时检索更多与IT相关的文章。研究的局限性/含义:尽管此研究仅限于IT主题,但也可以将其推广到任何主题。但是,对于IT等多学科主题,应特别注意关键词提取阶段。在这项研究中,使用了bigram模型。但是,也可以将其扩展为三元语法。原创性/价值:本文介绍了一种从一组科学文献中检索与IT相关的文献的集成方法。该方法有两个主要阶段:构建用于表示主题IT的模型,以及基于该模型检索文档。该模型基于一组关键短语,从一组IT文章中提取。但是,提取技术不依赖术语频率-反文档频率,因为集合中几乎所有的文章都共享一组相同的关键词。此外,还定义了概率成员资格评分,以从一系列科学文章中检索IT文章。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号