Reducing efforts of software engineering systematic literature reviews updates using text classification

Watanabe Willian Massami; Felizardo Katia Romero; Candido Jr Arnaldo; de Souza Erica Ferreira; de Campos Neto Jose Ede; Vijaykumar Nandamudi Lankalapalli

首页> 外文期刊>Information and software technology >Reducing efforts of software engineering systematic literature reviews updates using text classification

【24h】

Reducing efforts of software engineering systematic literature reviews updates using text classification

机译：减少软件工程系统文献的努力评论使用文本分类的更新

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Context: Systematic Literature Reviews (SLRs) are frequently used to synthesize evidence in Software Engineering (SE), however replicating and keeping SLRs up-to-date is a major challenge. The activity of studies selection in SLR is labor intensive due to the large number of studies that must be analyzed. Different approaches have been investigated to support SLR processes, such as: Visual Text Mining or Text Classification. But acquiring the initial dataset is time-consuming and labor intensive.Objective: In this work, we proposed and evaluated the use of Text Classification to support the studies selection activity of new evidences to update SLRs in SE.Method: We applied Text Classification techniques to investigate how effective and how much effort could be spared during the studies selection phase of an SLR update. Considering the SLRs update scenario, the studies analyzed in the primary SLR could be used as a classified dataset to train Supervised Machine Learning algorithms. We conducted an experiment with 8 Software Engineering SLRs. In the experiments, we investigated the use of multiple preprocessing and feature extraction tasks such as tokenization, stop words removal, word lemmatization, TF-IDF (Term-Frequency/Inverse-Document-Frequency) with Decision Tree and Support Vector Machines as classification algorithms. Furthermore, we configured the classifier activation threshold for maximizing Recall, hence reducing the number of Missed selected studies.Results: The techniques accuracies were measured and the results achieved on average a F-Score of 0.92 and 62% of exclusion rate when varying the activation threshold of the classifiers, with a 4% average number of Missed selected studies. Both the Exclusion rate and number of Missed selected studies were significantly different when compared to classifier which did not use the configuration of the activation threshold.Conclusion: The results showed the potential of the techniques in reducing the effort required of SLRs updates.

机译：背景信息：系统文献评论（SLRS）经常用于综合软件工程（SE）的证据，但是复制和保持SLRS最新是一项重大挑战。由于必须分析的研究数量，SLR中的研究选择的活动是劳动密集型。已经调查了不同的方法来支持SLR进程，例如：视觉文本挖掘或文本分类。但获取初始数据集是耗时和劳动密集型的。目的：在这项工作中，我们提出并评估了文本分类的使用，以支持新证据的研究选择活动，以在SE中更新SLR。我们应用了文本分类技术调查在SLR更新的研究选择阶段，可以施加有效程度和多么努力。考虑到SLRS更新方案，在主SLR中分析的研究可以用作培训监督机器学习算法的分类数据集。我们进行了一个有8个软件工程SLR的实验。在实验中，我们调查了使用多个预处理和特征提取任务，例如令牌化，停止单词删除，单词lemmatization，TF-IDF（术语 - 频率/逆文档频率）与决策树和支持向量机作为分类算法。此外，我们配置了分类器激活阈值以最大化召回，因此减少了错过的所选研究的数量。结果：测量了技术精度的准确性，并且在改变激活时平均为0.92和62％的禁用率所达到的结果。分类器的阈值，具有4％的错过所选研究数量。与不使用激活阈值的配置的分类器相比，排除率和未错过的所选研究的数量都显着差异。结论：结果表明了减少SLRS更新所需努力的技术的潜力。

著录项

来源
《Information and software technology》 |2020年第12期|106395.1-106395.15|共15页
作者
Watanabe Willian Massami; Felizardo Katia Romero; Candido Jr Arnaldo; de Souza Erica Ferreira; de Campos Neto Jose Ede; Vijaykumar Nandamudi Lankalapalli;
展开▼
作者单位

Fed Technol Univ Parana Cornelio Procopio PR Brazil;

Fed Technol Univ Parana Cornelio Procopio PR Brazil;

Fed Technol Univ Parana Medianeira PR Brazil;

Fed Technol Univ Parana Cornelio Procopio PR Brazil;

Fed Technol Univ Parana Cornelio Procopio PR Brazil;

Natl Inst Space Res Sao Jose Dos Campos SP Brazil|Univ Fed Sao Paulo Sao Jose Dos Campos SP Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Systematic literature review; SLR; Automatic selection; Review update; Text classification; Document classification; Text categorization;

机译：系统文献综述;SLR;自动选择;查看更新;文本分类;文档分类;文本分类;

相似文献

外文文献
中文文献
专利

1. Guidelines for the search strategy to update systematic literature reviews software engineering [J] . Wohlin Claes, Mendes Emilia, Felizardo Katia Romero, Information and software technology . 2020,第Nova期

机译：搜索策略准则更新系统文学评论软件工程
2. When to update systematic literature reviews in software engineering [J] . Emilia Mendes, Claes Wohlin, Katia Felizardo, The Journal of Systems and Software . 2020,第Sepa期

机译：何时更新软件工程中的系统文献综述
3. Six years of systematic literature reviews in software engineering: An updated tertiary study [J] . Fabio Q.B. da Silva, Andre L.M. Santos, Sergio Soares, Information and software technology . 2011,第9期

机译：六年的软件工程系统文献回顾：最新的三次研究
4. The Use of Grey Literature and Google Scholar in Software Engineering Systematic Literature Reviews [C] . Rubia Fatima, Affan Yasin, Lin Liu, IEEE Annual Computers, Software, and Applications Conference . 2020

机译：灰色文献和Google Scholar在软件工程系统文献评论中的使用
5. A Systematic Literature Review of Software Engineering for Scientific and Engineering Software and an Industrial Oil Pipeline Software Case Study. [D] . Farhoodi, Roshanak. 2011

机译：对科学和工程软件的软件工程进行系统的文献综述，以及工业输油管道软件案例研究。
6. Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews [O] . E. Popoff, M. Besada, J. P. Jansen, 2020

机译：对齐文本挖掘和机器学习算法具有系统文学评论中的学习选择的最佳实践
7. Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews [O] . Lazreg Sofien 2012

机译：努力使用信息提取和文本分类来支持系统的文献综述

Reducing efforts of software engineering systematic literature reviews updates using text classification

摘要

著录项

相似文献

相关主题

期刊订阅