An HMM-based approach for automatic detection and classification of duplicate bug reports

Ebrahimi Neda; Trabelsi Abdelaziz; Islam Md Shariful; Hamou-Lhadj Abdelwahab; Khanmohammadi Kobra

首页> 外文期刊>Information and software technology >An HMM-based approach for automatic detection and classification of duplicate bug reports

【24h】

An HMM-based approach for automatic detection and classification of duplicate bug reports

机译：基于HMM的自动检测方法和分类重复错误报告

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day.Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates.Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models.Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k = 1 is 59%, for Rank k = 2 is 75.55%. We start reaching the 90% recall from k = 10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k = 1 is around 63%, while this value increases by about 10% for k = 2. The recall increases to 97% for k = 11. A MAP value of up to 73% is achieved.Conclusion: We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.

机译：背景信息：软件项目依赖于其问题跟踪系统来指导软件开发人员的维护活动。提交给问题跟踪系统的错误报告携带关于崩溃性质的重要信息（例如来自用户或开发人员的文本以及在发生崩溃之前的运行功能的执行信息）。通常，大型软件项目每天都会收到数千个报告。目标：目的是减少修复错误所需的时间和精力，同时提高软件质量。以前的研究表明，预先报告的大量错误报告是重复的。例如，对于Firefox的所有报告中的所有报告都是重复的。方法：虽然存在各种各样的方法来自然语言处理自动检测重复的错误报告，但只有几种方法已经考虑了执行信息（所以 - 称为堆栈迹线）内部错误报告。在本文中，我们提出了一种新的方法，它使用堆栈迹线和隐藏的马克可夫模型自动检测重复的错误报告。结果：在将我们的方法应用于Firefox和GNOME数据集时，我们表明，对于Firefox，排名k = 1的平均召回为59％，排名克= 2是75.55％。我们开始从K = 10开始达到90％的召回。平均平均精度（地图）值高达76.5％。对于GNOME，K = 1的召回约为63％，而该值增加约10％，k = 2增加到k = 11的97％。达到最高73％的地图值。结论：我们展示了HMM和堆栈迹线是一种强大的组合，用于检测和分类大错误存储库中的重复错误报告。

著录项

来源
《Information and software technology》 |2019年第9期|98-109|共12页
作者
Ebrahimi Neda; Trabelsi Abdelaziz; Islam Md Shariful; Hamou-Lhadj Abdelwahab; Khanmohammadi Kobra;
展开▼
作者单位

Concordia Univ Dept Elect & Comp Engn Montreal PQ Canada;

Concordia Univ Dept Elect & Comp Engn Montreal PQ Canada;

Concordia Univ Dept Elect & Comp Engn Montreal PQ Canada;

Concordia Univ Dept Elect & Comp Engn Montreal PQ Canada;

Concordia Univ Dept Elect & Comp Engn Montreal PQ Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Duplicate bug reports; Stack traces; Hidden Markov models; Machine learning; Mining software repositories;

机译：复制错误报告;堆栈迹线;隐藏的马尔可夫模型;机器学习;挖掘软件存储库;

相似文献

外文文献
中文文献
专利

1. An HMM-based approach for automatic detection and classification of duplicate bug reports [J] . Ebrahimi Neda, Trabelsi Abdelaziz, Islam Md Shariful, Information and software technology . 2019,第SEPa期

机译：基于HMM的方法，用于自动检测和分类重复的错误报告
2. A Novel Technique for Duplicate Detection and Classification of Bug Reports [J] . Tao ZHANG, Byungjeong LEE IEICE transactions on information and systems . 2014,第7期

机译：错误报告的重复检测和分类的新技术
3. A contextual approach towards more accurate duplicate bug report detection and ranking [J] . Hindle Abram, Alipour Anahita, Stroulia Eleni Empirical Software Engineering . 2016,第2期

机译：一种上下文方法，可更准确地检测和报告重复的错误报告
4. Improving Performance of Automatic Duplicate Bug Reports Detection using Longest Common Sequence : Introducing New Textual Features for Textual Similarity Detection [C] . Behzad Soleimani Neysiani, Seyed Morteza Babamir 2019 IEEE 5th Conference on Knowledge Based Engineering and Innovation . 2019

机译：使用最长的公共序列提高自动重复错误报告检测的性能：为文本相似性检测引入新的文本功能
5. A contextual approach towards more accurate duplicate bug report detection. [D] . Alipour, Anahita. 2013

机译：一种用于更准确地检测重复错误报告的上下文方法。
6. Hybrid Continuous Density Hmm-Based Ensemble Neural Networks for Sensor Fault Detection and Classification in Wireless Sensor Network [O] . Malathy Emperuman, Srimathi Chandrasekaran 2020

机译：基于混合连续密度基于Hmm的集成神经网络用于无线传感器网络中的传感器故障检测和分类
7. An HMM-based approach for automatic detection and classification of duplicate bug reports [O] . Neda Ebrahimi, Abdelaziz Trabelsi, Md. Shariful Islam, 2019

机译：基于HMM的自动检测方法和分类重复错误报告

An HMM-based approach for automatic detection and classification of duplicate bug reports

摘要

著录项

相似文献

相关主题

期刊订阅