...
首页> 外文期刊>Information and software technology >An HMM-based approach for automatic detection and classification of duplicate bug reports
【24h】

An HMM-based approach for automatic detection and classification of duplicate bug reports

机译:基于HMM的自动检测方法和分类重复错误报告

获取原文
获取原文并翻译 | 示例
           

摘要

Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day.Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates.Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models.Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k = 1 is 59%, for Rank k = 2 is 75.55%. We start reaching the 90% recall from k = 10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k = 1 is around 63%, while this value increases by about 10% for k = 2. The recall increases to 97% for k = 11. A MAP value of up to 73% is achieved.Conclusion: We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.
机译:背景信息:软件项目依赖于其问题跟踪系统来指导软件开发人员的维护活动。提交给问题跟踪系统的错误报告携带关于崩溃性质的重要信息(例如来自用户或开发人员的文本以及在发生崩溃之前的运行功能的执行信息)。通常,大型软件项目每天都会收到数千个报告。目标:目的是减少修复错误所需的时间和精力,同时提高软件质量。以前的研究表明,预先报告的大量错误报告是重复的。例如,对于Firefox的所有报告中的所有报告都是重复的。方法:虽然存在各种各样的方法来自然语言处理自动检测重复的错误报告,但只有几种方法已经考虑了执行信息(所以 - 称为堆栈迹线)内部错误报告。在本文中,我们提出了一种新的方法,它使用堆栈迹线和隐藏的马克可夫模型自动检测重复的错误报告。结果:在将我们的方法应用于Firefox和GNOME数据集时,我们表明,对于Firefox,排名k = 1的平均召回为59%,排名克= 2是75.55%。我们开始从K = 10开始达到90%的召回。平均平均精度(地图)值高达76.5%。对于GNOME,K = 1的召回约为63%,而该值增加约10%,k = 2增加到k = 11的97%。达到最高73%的地图值。结论:我们展示了HMM和堆栈迹线是一种强大的组合,用于检测和分类大错误存储库中的重复错误报告。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号