首页> 外文会议>IEEE International Conference on Software Analysis, Evolution, and Reengineering >Dissection of a bug dataset: Anatomy of 395 patches from Defects4J
【24h】

Dissection of a bug dataset: Anatomy of 395 patches from Defects4J

机译:剖析错误数据集:剖析Defects4J的395个补丁

获取原文

摘要

Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "for which bugs is my technique effective?" depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques' results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.
机译:设计良好且可公开获得的错误数据集是推进诸如故障定位和程序修复之类的研究领域的宝贵资产,因为它们可以直接,公平地比较竞争技术和实验复制。研究人员需要深入理解这些数据集:诸如“我的技术可以处理哪些错误?”之类的问题的答案。和“我的技术对哪些错误有效?”取决于与错误及其补丁相关的属性的理解。但是,此类属性通常不包含在数据集中,并且仍然没有被广泛采用的表征错误和补丁的方法。在这项工作中,我们深入研究了Defects4J数据集的395个补丁。使用基于主题分析的方法自动提取定量属性(补丁大小和分布),而手动提取定性属性(修复动作和模式)。我们发现1)Defects4J补丁的中值大小为四行,并且几乎30%的补丁仅包含行。 2)92%的补丁程序仅更改一个文件,而38%的补丁程序根本没有传播; 3)最常用的前三项修复动作是添加方法调用,条件和赋值,它们发生在修补程序的77%中; 4)针对95%的补丁发现了9种修复模式,其中最常见的补丁出现在有条件的补丁中,占补丁的43%。这些结果对于研究人员基于Defects4J对其技术结果进行高级分析很有用。此外,我们的属性集可用于表征和比较不同的错误数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号