Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL

Kim Misoo; Lee Eunseok

首页> 外文期刊>Empirical Software Engineering >Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL

【24h】

Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL

机译：数据集是基于信息检索的错误本地化技术可信赖吗？ IRBL上的错误类型的影响分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Various evaluation datasets are used to evaluate the performance of information retrieval-based bug localization (IRBL) techniques. To accurately evaluate the IRBL and furthermore improve the performance, it is strongly required to analyze the validity of these datasets in advance. To this end, we surveyed 50 previous studies, collected 41,754 bug reports, and found out critical problems that affect the validity of results of performance evaluation. They are in both the ground truth and the search space. These problems arise from using different bug types without clearly distinguishing them. We divided the bugs into production- and test-related bugs. Based on this distinction, we investigate and analyze the impact of the bug type on IRBL performance evaluation. Approximately 18.6% of the bug reports were linked to non-buggy files as the ground truth. Up to 58.5% of the source files in the search space introduced noise into the localization of a specific bug type. From the experiments, we validated that the average precision changed in approximately 90% of the bug reports linked with an incorrect ground truth; we determined that specifying a suitable search space changed the average precision in at least half of the bug reports. Further, we showed that these problems can alter the relative ranks of the IRBL techniques. Our large-scale analysis demonstrated that a significant amount of noise occurs, which can compromise the evaluation results. An important finding of this study is that it is essential to consider the bug types to improve the accuracy of the performance evaluation.

机译：各种评估数据集用于评估信息检索的错误定位（IRBL）技术的性能。为了准确评估IRBL并进一步提高性能，强烈要求提前分析这些数据集的有效性。为此，我们调查了50次研究，收集了41,754个错误报告，发现了影响绩效评估结果的有效性的关键问题。他们都在地面真理和搜索空间。这些问题来自使用不同的错误类型而不清楚地区分它们。我们将错误划分为生产和测试相关的错误。基于这种区分，我们调查并分析Bug类型对IRBL性能评估的影响。大约18.6％的错误报告与非拨打文件链接到了基础事实。搜索空间中最多58.5％的源文件将噪声引入特定错误类型的本地化。从实验中，我们验证了平均精度在大约90％的错误报告中改变了与错误的地面真相相关的错误报告;我们确定指定合适的搜索空间在至少一半的错误报告中改变了平均精度。此外，我们表明这些问题可以改变IRBL技术的相对等级。我们的大规模分析表明，发生大量噪音，这可能会损害评估结果。这项研究的一个重要发现是，必须考虑Bug类型以提高性能评估的准确性。

著录项

来源
《Empirical Software Engineering》 |2021年第3期|35.1-35.66|共66页
作者
Kim Misoo; Lee Eunseok;
展开▼
作者单位

Sungkyunkwan Univ Dept Elect & Comp Engn Suwon South Korea;

Sungkyunkwan Univ Coll Comp Suwon South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bug type; Information retrieval-based bug localization; Performance evaluation; Ground-truth dataset; Search space;

机译：BUG类型;信息检索的错误本地化;绩效评估;地面真理数据集;搜索空间;
入库时间 2022-08-19 02:16:30

相似文献

外文文献
中文文献
专利

1. On the relationship between bug reports and queries for text retrieval-based bug localization [J] . Chris Mills, Esteban Parra, Jevgenija Pantiuchina, Empirical Software Engineering . 2020,第5期

机译：关于基于文本检索的错误本地化的错误报告与查询的关系
2. Application of Machine Learning Techniques in Post-Silicon Debugging and Bug Localization [J] . Eman El Mandouh, Amr G. Wassal Journal of Electronic Testing: Theory and Applications: Theory and Applications . 2018,第2期

机译：机器学习技术在硅后调试和错误本地化中的应用
3. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools [J] . Le Tien-Duy B., Thung Ferdian, Lo David Empirical Software Engineering . 2017,第4期

机译：这个本地化工具对这个错误有效吗？减轻基于信息检索的不可靠基于错误的本地化工具的影响
4. Poster: Are Information Retrieval-Based Bug Localization Techniques Trustworthy? [C] . Misoo Kim, Eunseok Lee 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion . 2018

机译：海报：基于信息检索的Bug本地化技术值得信赖吗？
5. Understanding the Impact of Diversity in Software Bugs on Bug Prediction Models. [D] . Valdivia-Garcia, Harold. 2016

机译：了解软件错误中的多样性对错误预测模型的影响。
6. Bed bugs bite the hospitality industry? A framing analysis of bed bug news coverage [O] . Bingjie Liu, Lori Pennington-Gray -1

机译：臭虫叮咬酒店业？臭虫新闻报道的框架分析
7. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports [O] . Jian Zhou, Hongyu Zhang, David Lo 2012

机译：错误应该在哪里修复？基于错误报告的基于Revical的基于错误定位更准确的信息

Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL

摘要

著录项

相似文献

相关主题

期刊订阅