【24h】

An empirical study of bugs in test code

机译:对测试代码中的错误的实证研究

获取原文

摘要

Testing aims at detecting (regression) bugs in production code. However, testing code is just as likely to contain bugs as the code it tests. Buggy test cases can silently miss bugs in the production code or loudly ring false alarms when the production code is correct. We present the first empirical study of bugs in test code to characterize their prevalence and root cause categories. We mine the bug repositories and version control systems of 211 Apache Software Foundation (ASF) projects and find 5,556 test-related bug reports. We (1) compare properties of test bugs with production bugs, such as active time and fixing effort needed, and (2) qualitatively study 443 randomly sampled test bug reports in detail and categorize them based on their impact and root causes. Our results show that (1) around half of all the projects had bugs in their test code; (2) the majority of test bugs are false alarms, i.e., test fails while the production code is correct, while a minority of these bugs result in silent horrors, i.e., test passes while the production code is incorrect; (3) incorrect and missing assertions are the dominant root cause of silent horror bugs; (4) semantic (25%), flaky (21%), environment-related (18%) bugs are the dominant root cause categories of false alarms; (5) the majority of false alarm bugs happen in the exercise portion of the tests, and (6) developers contribute more actively to fixing test bugs and test bugs are fixed sooner compared to production bugs. In addition, we evaluate whether existing bug detection tools can detect bugs in test code.
机译:测试旨在检测(回归)生产代码中的错误。但是,测试代码包含的错误与其测试的代码一样可能。错误的测试用例可以静默地遗漏生产代码中的错误,或者在生产代码正确时大声地响起错误警报。我们提供了对测试代码中的错误的第一个实证研究,以表征它们的普遍性和根本原因类别。我们挖掘了211 Apache Software Foundation(ASF)项目的错误存储库和版本控制系统,并找到了5,556个与测试相关的错误报告。我们(1)比较测试错误和生产错误的属性,例如活动时间和所需的修复工作,以及(2)定性详细研究443个随机抽样的测试错误报告,并根据其影响和根本原因对其进行分类。我们的结果表明:(1)大约一半的项目的测试代码中都有错误; (2)大多数测试错误是错误警报,即在生产代码正确的情况下测试失败,而其中的少数错误会导致无声的恐怖,即在生产代码错误的情况下通过测试; (3)错误和断言是无声恐怖错误的主要根源; (4)语义错误(25%),片状错误(21%),与环境相关的错误(18%)是错误警报的主要根本原因类别; (5)大多数错误警报错误发生在测试的练习部分中,并且(6)开发人员在修复测试错误方面做出了更加积极的贡献,并且与生产错误相比,测试错误的修复速度更快。另外,我们评估现有的错误检测工具是否可以检测测试代码中的错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号