首页> 外文期刊>Empirical Software Engineering >Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL
【24h】

Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL

机译:数据集是基于信息检索的错误本地化技术可信赖吗? IRBL上的错误类型的影响分析

获取原文
       

摘要

Various evaluation datasets are used to evaluate the performance of information retrieval-based bug localization (IRBL) techniques. To accurately evaluate the IRBL and furthermore improve the performance, it is strongly required to analyze the validity of these datasets in advance. To this end, we surveyed 50 previous studies, collected 41,754 bug reports, and found out critical problems that affect the validity of results of performance evaluation. They are in both the ground truth and the search space. These problems arise from using different bug types without clearly distinguishing them. We divided the bugs into production- and test-related bugs. Based on this distinction, we investigate and analyze the impact of the bug type on IRBL performance evaluation. Approximately 18.6% of the bug reports were linked to non-buggy files as the ground truth. Up to 58.5% of the source files in the search space introduced noise into the localization of a specific bug type. From the experiments, we validated that the average precision changed in approximately 90% of the bug reports linked with an incorrect ground truth; we determined that specifying a suitable search space changed the average precision in at least half of the bug reports. Further, we showed that these problems can alter the relative ranks of the IRBL techniques. Our large-scale analysis demonstrated that a significant amount of noise occurs, which can compromise the evaluation results. An important finding of this study is that it is essential to consider the bug types to improve the accuracy of the performance evaluation.
机译:各种评估数据集用于评估信息检索的错误定位(IRBL)技术的性能。为了准确评估IRBL并进一步提高性能,强烈要求提前分析这些数据集的有效性。为此,我们调查了50次研究,收集了41,754个错误报告,发现了影响绩效评估结果的有效性的关键问题。他们都在地面真理和搜索空间。这些问题来自使用不同的错误类型而不清楚地区分它们。我们将错误划分为生产和测试相关的错误。基于这种区分,我们调查并分析Bug类型对IRBL性能评估的影响。大约18.6%的错误报告与非拨打文件链接到了基础事实。搜索空间中最多58.5%的源文件将噪声引入特定错误类型的本地化。从实验中,我们验证了平均精度在大约90%的错误报告中改变了与错误的地面真相相关的错误报告;我们确定指定合适的搜索空间在至少一半的错误报告中改变了平均精度。此外,我们表明这些问题可以改变IRBL技术的相对等级。我们的大规模分析表明,发生大量噪音,这可能会损害评估结果。这项研究的一个重要发现是,必须考虑Bug类型以提高性能评估的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号