An HMM-based approach for automatic detection and classification of duplicate bug reports

Ebrahimi Neda; Trabelsi Abdelaziz; Islam Md Shariful; Hamou-Lhadj Abdelwahab; Khanmohammadi Kobra

首页> 外文期刊>Information and software technology >An HMM-based approach for automatic detection and classification of duplicate bug reports

【24h】

An HMM-based approach for automatic detection and classification of duplicate bug reports

机译：基于HMM的方法，用于自动检测和分类重复的错误报告

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day.Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates.Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models.Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k = 1 is 59%, for Rank k = 2 is 75.55%. We start reaching the 90% recall from k = 10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k = 1 is around 63%, while this value increases by about 10% for k = 2. The recall increases to 97% for k = 11. A MAP value of up to 73% is achieved.Conclusion: We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.

机译：上下文：软件项目依靠其问题跟踪系统来指导软件开发人员的维护活动。提交给问题跟踪系统的错误报告包含有关崩溃性质的重要信息（例如，来自用户或开发人员的文本以及有关崩溃发生之前正在运行的功能的执行信息）。通常，大型软件项目每天都会收到数千份报告。目的：目的是减少修复错误所需的时间和精力，同时提高整体软件质量。先前的研究表明，大量的错误报告是先前报告的重复报告。例如，在Firefox中，多达30％的报告都是重复的。方法：虽然有多种方法可以通过自然语言处理自动检测重复的错误报告，但只有少数几种方法考虑了执行信息（因此，在错误报告中称为堆栈跟踪）。在本文中，我们提出了一种新颖的方法，该方法使用堆栈跟踪和隐马尔可夫模型自动检测重复的错误报告。结果：将我们的方法应用于Firefox和GNOME数据集时，我们表明，对于Firefox，等级k = 1的平均召回率是59％，而排名k = 2则是75.55％。我们从k = 10开始达到90％的召回率。平均平均精度（MAP）值高达76.5％。对于GNOME，在k = 1时的召回率约为63％，而对于k = 2时，此值增加约10％。对于k = 11，召回率将增加到97％。获得的MAP值最高为73％。：我们证明了HMM和堆栈跟踪是在大型错误存储库中检测和分类重复错误报告的强大组合。

著录项

来源
《Information and software technology》 |2019年第9期|98-109|共12页
作者
Ebrahimi Neda; Trabelsi Abdelaziz; Islam Md Shariful; Hamou-Lhadj Abdelwahab; Khanmohammadi Kobra;
展开▼
作者单位

Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada;

Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada;

Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada;

Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada;

Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Duplicate bug reports; Stack traces; Hidden Markov models; Machine learning; Mining software repositories;

机译：重复的错误报告;堆栈跟踪;隐马尔可夫模型;机器学习;挖掘软件存储库;

相似文献

外文文献
中文文献
专利

1. An HMM-based approach for automatic detection and classification of duplicate bug reports [J] . Ebrahimi Neda, Trabelsi Abdelaziz, Islam Md Shariful, Information and software technology . 2019,第Sepa期

机译：基于HMM的自动检测方法和分类重复错误报告
2. A Novel Technique for Duplicate Detection and Classification of Bug Reports [J] . Tao ZHANG, Byungjeong LEE IEICE transactions on information and systems . 2014,第7期

机译：错误报告的重复检测和分类的新技术
3. A contextual approach towards more accurate duplicate bug report detection and ranking [J] . Hindle Abram, Alipour Anahita, Stroulia Eleni Empirical Software Engineering . 2016,第2期

机译：一种上下文方法，可更准确地检测和报告重复的错误报告
4. Improving Performance of Automatic Duplicate Bug Reports Detection using Longest Common Sequence : Introducing New Textual Features for Textual Similarity Detection [C] . Behzad Soleimani Neysiani, Seyed Morteza Babamir 2019 IEEE 5th Conference on Knowledge Based Engineering and Innovation . 2019

机译：使用最长的公共序列提高自动重复错误报告检测的性能：为文本相似性检测引入新的文本功能
5. A contextual approach towards more accurate duplicate bug report detection. [D] . Alipour, Anahita. 2013

机译：一种用于更准确地检测重复错误报告的上下文方法。
6. Hybrid Continuous Density Hmm-Based Ensemble Neural Networks for Sensor Fault Detection and Classification in Wireless Sensor Network [O] . Malathy Emperuman, Srimathi Chandrasekaran 2020

机译：基于混合连续密度基于Hmm的集成神经网络用于无线传感器网络中的传感器故障检测和分类
7. An HMM-based approach for automatic detection and classification of duplicate bug reports [O] . Neda Ebrahimi, Abdelaziz Trabelsi, Md. Shariful Islam, 2019

机译：基于HMM的自动检测方法和分类重复错误报告

An HMM-based approach for automatic detection and classification of duplicate bug reports

摘要

著录项

相似文献

相关主题

期刊订阅