首页> 外文会议>Asia-Pacific Software Engineering Conference >Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning
【24h】

Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning

机译:基于注意力的模型和集合学习的软件缺陷预测和本地化

获取原文

摘要

Software defect prediction (SDP) utilizes a trained prediction model to predict the defect proneness of code modules in a software system by mining the inherent characteristics of historical defect data. An effective model can optimize the allocation of testing resources, thus improving the quality of software products. Most previous studies use handcrafted features to represent code snippets, but the main problem is that it is difficult to capture the semantic and structural information of the code context, which is often crucial for software defect prediction. Meanwhile, most of the existing software defect prediction models cannot make predictions at the code line level, which makes it extremely arduous to provide developers with more detailed reference information. To address these issues, in this paper, we propose a model based on ensemble learning techniques and attention mechanisms to offer more comprehensive prediction information to developers by locating suspect lines of code when making method-level defect predictions. This model leverages abstract syntax trees (ASTs) as the intermediate representation of code snippets. Since the historical defect data has a striking characteristic of class-imbalance, an approach based on Self-organizing Map (SOM) clustering is employed to handle noisy data. Experimental results show that, on average, the proposed model improves the F-measure by 17.7% and AUC by 37.8%, compared with the other four machine learning algorithms.
机译:软件缺陷预测(SDP)利用训练有素的预测模型来通过挖掘历史缺陷数据的固有特征来预测软件系统中的代码模块的缺陷倾向。有效的模型可以优化测试资源的分配,从而提高软件产品的质量。最先前的研究使用手工制作的功能来代表代码片段,但主要问题是难以捕获代码上下文的语义和结构信息,这通常对软件缺陷预测至关重要。同时,大多数现有软件缺陷预测模型不能在代码线级别进行预测,这使得为开发人员提供更详细的参考信息,这使得它非常艰巨。为了解决这些问题,在本文中,我们提出了一种基于集合学习技术和注意机制的模型,通过在制作方法级缺陷预测时定位可疑的代码线来为开发人员提供更全面的预测信息。此模型利用摘要语法树(ASTS)作为代码片段的中间表示。由于历史缺陷数据具有类别不平衡的醒目特性,因此采用基于自组织地图(SOM)群集的方法来处理嘈杂的数据。实验结果表明,与其他四台机器学习算法相比,拟议模型将拟议模型提高了17.7%和AUC的37.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号