首页> 外文会议>IEEE International Symposium on Software Reliability Engineering >Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization
【24h】

Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization

机译:在多用途定影中过滤噪声有助于改善缺陷预测和定位

获取原文

摘要

In open-source software projects, during fixing software faults, developers sometimes also perform other types of non-fixing code changes such as functionality enhancement, code restructuring/improving, or documentation. They commit non-fixing changes together with the fixing ones in the same transaction. We call them mixed-purpose fixing commits (MFCs). We have conducted an empirical study on MFCs in several popular open-source projects. Our results showed that MFCs are about 11%–39% of total fixing commits. In 3%–41% of MFCs, developers performed other change types without indicating them in the commit logs. Our study also showed that mining software repositories (MSR) approaches that rely on the recovery of the history of fixed/buggy files are affected by the noisy data where non-fixing changes in MFCs are considered as fixing ones. The results of our study motivated us to develop Cardo, a tool to identify MFCs and filter non-fixing changed files in the change sets of the fixing commits. It uses natural language processing to analyze the sentences in commit logs and program analysis to cluster the changes in the change sets to determine if a changed file is for non-fixing. Our empirical evaluation on several open-source projects showed that Cardo achieves on average 93% precision, and existing MSR approaches can be relatively improved up to 32% with data filtered by Cardo.
机译:在开源软件项目中,在修复软件故障期间,开发人员有时还会执行其他类型的非修复代码更改,例如功能增强,代码重组/改进或文档编制。它们在同一事务中将非固定更改与固定更改一起提交。我们称它们为多功能修复提交(MFC)。我们已经在几个流行的开源项目中对MFC进行了实证研究。我们的结果表明,MFC约占总修复提交的11%–39%。在3%到41%的MFC中,开发人员执行了其他更改类型,但未在提交日志中指出它们。我们的研究还表明,依赖于修复固定/错误文件历史记录的挖掘软件存储库(MSR)方法会受到嘈杂数据的影响,其中MFC中的非固定更改被视为固定更改。我们的研究结果促使我们开发Cardo,这是一种可识别MFC并在修复提交的更改集中过滤非修复更改文件的工具。它使用自然语言处理来分析提交日志中的句子,并使用程序分析来对更改集中的更改进行聚类,以确定更改后的文件是否用于非修复。我们对几个开源项目的经验评估表明,Cardo的平均精度达到93%,而使用Cardo过滤的数据,现有的MSR方法可以相对提高到32%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号