Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

Mohebi Azadeh; Sedighi Mehri; Zargaran Zahra

首页> 外文期刊>Library Review >Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

【24h】

Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

机译：基于主题的科学文献检索，案例研究：信息技术科学文章检索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose: The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields. Design/methodology/approach: The authors propose to apply a statistical classification-based approach for extracting IT-related articles. In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques. Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme. Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities. Findings: The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model. They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas. The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT. Research limitations/implications: Although this research is limited to the IT subject, it can be generalized for any subject as well. However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase. In this research, bigram model is used; however, one can extend it to tri-gram as well. Originality/value: This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents. The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model. The model, based on a set of keyphrases, extracted from a collection of IT articles. However, the extraction technique does not rely on Term Frequency-Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases. In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.

机译：目的：本文的目的是介绍一种从诸如Web of Science（WoS）之类的科学数据库中检索信息技术（IT）领域中的一组科学文章的方法，以应用科学计量学指标并将其与其他指标进行比较领域。设计/方法/方法：作者建议采用基于统计分类的方法来提取与IT相关的文章。在这种方法中，首先，使用密钥短语提取技术引入概率模型来对主题IT进行建模。然后，他们根据贝叶斯分类方案从WoS中所有伊朗论文中检索与IT相关的文章。他们基于概率IT模型，为数据库中的每篇文章分配IT成员资格概率，然后检索具有最高概率的文章。结果：作者通过概率短语抽取过程提取了一组IT密钥短语，其中包含1,497个术语。他们用两种方法评估了建议的检索方法：基于查询的方法，其中使用一组由有限的IT关键字组成的查询从WoS中检索文章；以及基于研究区域的方法，该方法基于使用以下方法检索文章WoS的分类和研究领域。评估和比较结果表明，该方法能够生成更准确的结果，同时检索更多与IT相关的文章。研究的局限性/含义：尽管此研究仅限于IT主题，但也可以将其推广到任何主题。但是，对于IT等多学科主题，应特别注意关键词提取阶段。在这项研究中，使用了bigram模型。但是，也可以将其扩展为三元语法。原创性/价值：本文介绍了一种从一组科学文献中检索与IT相关的文献的集成方法。该方法有两个主要阶段：构建用于表示主题IT的模型，以及基于该模型检索文档。该模型基于一组关键短语，从一组IT文章中提取。但是，提取技术不依赖术语频率-反文档频率，因为集合中几乎所有的文章都共享一组相同的关键词。此外，还定义了概率成员资格评分，以从一系列科学文章中检索IT文章。

著录项

来源
《Library Review》 |2017年第7期|549-569|共21页
作者
Mohebi Azadeh; Sedighi Mehri; Zargaran Zahra;
展开▼
作者单位

Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran;

Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran;

Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Document retrieval; Information retrieval; Information technology; Keyphrase extraction; Probabilistic modeling; Scientometrics;

机译：文件检索;信息检索;信息技术;关键短语提取;概率建模;科学计量学;

相似文献

外文文献
中文文献
专利

1. Retrieval of Scientific Documents Based on HFS and BERT [J] . Xuedong Tian, Jiameng Wang Quality Control, Transactions . 2021,第1期

机译：基于HFS和BERT的科学文档检索
2. Incorporating Quality Measurement into Scientific Document Retrieval [J] . Nedra Ibrahim, Anja Habacha Chaibi, Henda Ben Ghezala Journal of digital information management . 2021,第2期

机译：将质量测量纳入科学文档检索
3. Literature Explorer: effective retrieval of scientific documents through nonparametric thematic topic detection [J] . Wu Shaopeng, Zhao Youbing, Parvinzamir Farzad, The Visual Computer . 2020,第7期

机译：文献探险者：通过非参数专题检测有效地检索科学文档
4. User Modeling and Instance Reuse for Information Retrieval Study Case : Visually Disabled Users Access to Scientific Documents [C] . JERIBI Lobna, RUMPLER Beatrice, PINON Jean Marie Fifteenth International Florida Artificial Intelligence Research Society Conference, May 14-16, 2002, Pensacola Beach, Florida . 2002

机译：用于信息检索的用户建模和实例重用研究案例：视障用户访问科学文档
5. MAN, DESIGN, MACHINE: AN INQUIRY INTO PRINCIPLES OF NORMATIVE PLANNING FOR COMPUTER-BASED TECHNICAL SYSTEMS ILLUSTRATED BY A CASE OF DESIGN OF AN ENTRY, STORAGE AND RETRIEVAL SYSTEM FOR SCIENTIFIC COMMUNICATION AND TECHNOLOGY TRANSFER. [D] . SACHS, WLODZIMIERZ MICHEL -1

机译：人，设计，机器：对基于计算机的技术系统的规范性规划的原理的询问，以科学通信和技术转让的输入，存储和检索系统的设计为例。
6. A PC Classifier of Clinical Text Documents: Advanced Information Retrieval Technology Transfer [O] . David B. Aronow, Avinoam Shmueli 1996

机译：临床文本文档的PC分类器：高级信息检索技术转移
7. Document Analysis and Retrieval Tasks in Scientific Digital Libraries [O] . 2016

机译：科技数字图书馆的文献分析与检索任务

Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

摘要

著录项

相似文献

相关主题

期刊订阅