Dissection of a bug dataset: Anatomy of 395 patches from Defects4J

机译：剖析错误数据集：剖析Defects4J的395个补丁

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "for which bugs is my technique effective?" depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques' results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.

机译：设计良好且可公开获得的错误数据集是推进诸如故障定位和程序修复之类的研究领域的宝贵资产，因为它们可以直接，公平地比较竞争技术和实验复制。研究人员需要深入理解这些数据集：诸如“我的技术可以处理哪些错误？”之类的问题的答案。和“我的技术对哪些错误有效？”取决于与错误及其补丁相关的属性的理解。但是，此类属性通常不包含在数据集中，并且仍然没有被广泛采用的表征错误和补丁的方法。在这项工作中，我们深入研究了Defects4J数据集的395个补丁。使用基于主题分析的方法自动提取定量属性（补丁大小和分布），而手动提取定性属性（修复动作和模式）。我们发现1）Defects4J补丁的中值大小为四行，并且几乎30％的补丁仅包含行。 2）92％的补丁程序仅更改一个文件，而38％的补丁程序根本没有传播; 3）最常用的前三项修复动作是添加方法调用，条件和赋值，它们发生在修补程序的77％中; 4）针对95％的补丁发现了9种修复模式，其中最常见的补丁出现在有条件的补丁中，占补丁的43％。这些结果对于研究人员基于Defects4J对其技术结果进行高级分析很有用。此外，我们的属性集可用于表征和比较不同的错误数据集。

著录项

来源
《IEEE International Conference on Software Analysis, Evolution, and Reengineering》|2018年|130-140|共11页
会议地点
作者
Victor Sobreira; Thomas Durieux; Fernanda Madeiral; Martin Monperrus; Marcelo de Almeida Maia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computer bugs; Maintenance engineering; Measurement; Manuals; Taxonomy; Data collection; Task analysis;

机译：计算机错误;维护工程;测量;手册;分类法;数据收集;任务分析;
入库时间 2022-08-26 13:51:45

相似文献

外文文献
中文文献
专利

1. Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset [J] . Martinez Matias, Durieux Thomas, Sommerard Romain, Empirical Software Engineering . 2017,第4期

机译：自动修复Java中的真实错误：对errors4j数据集的大规模实验
2. Where were the repair ingredients for Defects4j bugs? [J] . Yang Deheng, Liu Kui, Kim Dongsun, Empirical Software Engineering . 2021,第6期

机译：缺陷4j虫子的修复成分在哪里？
3. Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL [J] . Kim Misoo, Lee Eunseok Empirical Software Engineering . 2021,第3期

机译：数据集是基于信息检索的错误本地化技术可信赖吗？ IRBL上的错误类型的影响分析
4. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J [C] . Victor Sobreira, Thomas Durieux, Fernanda Madeiral, IEEE International Conference on Software Analysis, Evolution and Reengineering . 2018

机译：解剖臭虫数据集：来自缺陷4J的395个补丁的解剖学
5. Useful Learning Tools in Anatomy Dissection Video [D] . Forbes, Anna. 2019

机译：解剖解剖视频中有用的学习工具
6. ExpressionData - A public resource of high quality curated datasets representing gene expression across anatomy development and experimental conditions [O] . Philip Zimmermann, Stefan Bleuler, Oliver Laule, 2014

机译：ExpressionData-高质量的精选数据集的公共资源代表跨解剖发育和实验条件的基因表达
7. Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J [O] . Sobreira, Victor, Durieux, Thomas, Madeiral, Fernanda, 2018

机译：解剖Bug数据集：从缺陷4J解剖395个补丁

Dissection of a bug dataset: Anatomy of 395 patches from Defects4J

摘要

著录项

相似文献

相关主题

期刊订阅