...
首页> 外文期刊>Information and software technology >Reducing efforts of software engineering systematic literature reviews updates using text classification
【24h】

Reducing efforts of software engineering systematic literature reviews updates using text classification

机译:减少软件工程系统文献的努力评论使用文本分类的更新

获取原文
获取原文并翻译 | 示例
           

摘要

Context: Systematic Literature Reviews (SLRs) are frequently used to synthesize evidence in Software Engineering (SE), however replicating and keeping SLRs up-to-date is a major challenge. The activity of studies selection in SLR is labor intensive due to the large number of studies that must be analyzed. Different approaches have been investigated to support SLR processes, such as: Visual Text Mining or Text Classification. But acquiring the initial dataset is time-consuming and labor intensive.Objective: In this work, we proposed and evaluated the use of Text Classification to support the studies selection activity of new evidences to update SLRs in SE.Method: We applied Text Classification techniques to investigate how effective and how much effort could be spared during the studies selection phase of an SLR update. Considering the SLRs update scenario, the studies analyzed in the primary SLR could be used as a classified dataset to train Supervised Machine Learning algorithms. We conducted an experiment with 8 Software Engineering SLRs. In the experiments, we investigated the use of multiple preprocessing and feature extraction tasks such as tokenization, stop words removal, word lemmatization, TF-IDF (Term-Frequency/Inverse-Document-Frequency) with Decision Tree and Support Vector Machines as classification algorithms. Furthermore, we configured the classifier activation threshold for maximizing Recall, hence reducing the number of Missed selected studies.Results: The techniques accuracies were measured and the results achieved on average a F-Score of 0.92 and 62% of exclusion rate when varying the activation threshold of the classifiers, with a 4% average number of Missed selected studies. Both the Exclusion rate and number of Missed selected studies were significantly different when compared to classifier which did not use the configuration of the activation threshold.Conclusion: The results showed the potential of the techniques in reducing the effort required of SLRs updates.
机译:背景信息:系统文献评论(SLRS)经常用于综合软件工程(SE)的证据,但是复制和保持SLRS最新是一项重大挑战。由于必须分析的研究数量,SLR中的研究选择的活动是劳动密集型。已经调查了不同的方法来支持SLR进程,例如:视觉文本挖掘或文本分类。但获取初始数据集是耗时和劳动密集型的。目的:在这项工作中,我们提出并评估了文本分类的使用,以支持新证据的研究选择活动,以在SE中更新SLR。我们应用了文本分类技术调查在SLR更新的研究选择阶段,可以施加有效程度和多么努力。考虑到SLRS更新方案,在主SLR中分析的研究可以用作培训监督机器学习算法的分类数据集。我们进行了一个有8个软件工程SLR的实验。在实验中,我们调查了使用多个预处理和特征提取任务,例如令牌化,停止单词删除,单词lemmatization,TF-IDF(术语 - 频率/逆文档频率)与决策树和支持向量机作为分类算法。此外,我们配置了分类器激活阈值以最大化召回,因此减少了错过的所选研究的数量。结果:测量了技术精度的准确性,并且在改变激活时平均为0.92和62%的禁用率所达到的结果。分类器的阈值,具有4%的错过所选研究数量。与不使用激活阈值的配置的分类器相比,排除率和未错过的所选研究的数量都显着差异。结论:结果表明了减少SLRS更新所需努力的技术的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号