...
首页> 外文期刊>Information and software technology >An HMM-based approach for automatic detection and classification of duplicate bug reports
【24h】

An HMM-based approach for automatic detection and classification of duplicate bug reports

机译:基于HMM的方法,用于自动检测和分类重复的错误报告

获取原文
获取原文并翻译 | 示例
           

摘要

Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day.Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates.Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models.Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k = 1 is 59%, for Rank k = 2 is 75.55%. We start reaching the 90% recall from k = 10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k = 1 is around 63%, while this value increases by about 10% for k = 2. The recall increases to 97% for k = 11. A MAP value of up to 73% is achieved.Conclusion: We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.
机译:上下文:软件项目依靠其问题跟踪系统来指导软件开发人员的维护活动。提交给问题跟踪系统的错误报告包含有关崩溃性质的重要信息(例如,来自用户或开发人员的文本以及有关崩溃发生之前正在运行的功能的执行信息)。通常,大型软件项目每天都会收到数千份报告。目的:目的是减少修复错误所需的时间和精力,同时提高整体软件质量。先前的研究表明,大量的错误报告是先前报告的重复报告。例如,在Firefox中,多达30%的报告都是重复的。方法:虽然有多种方法可以通过自然语言处理自动检测重复的错误报告,但只有少数几种方法考虑了执行信息(因此,在错误报告中称为堆栈跟踪)。在本文中,我们提出了一种新颖的方法,该方法使用堆栈跟踪和隐马尔可夫模型自动检测重复的错误报告。结果:将我们的方法应用于Firefox和GNOME数据集时,我们表明,对于Firefox,等级k = 1的平均召回率是59%,而排名k = 2则是75.55%。我们从k = 10开始达到90%的召回率。平均平均精度(MAP)值高达76.5%。对于GNOME,在k = 1时的召回率约为63%,而对于k = 2时,此值增加约10%。对于k = 11,召回率将增加到97%。获得的MAP值最高为73%。 :我们证明了HMM和堆栈跟踪是在大型错误存储库中检测和分类重复错误报告的强大组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号