DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports

机译：DURFEX：一种特征提取技术，可有效检测重复的错误报告

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The detection of duplicate bug reports can help reduce the processing time of handling field crashes. This is especially important for software companies with a large client base where multiple customers can submit bug reports, caused by the same faults. There exist several techniques for the detection of duplicate bug reports; many of them rely on some sort of classification techniques applied to information extracted from stack traces. They classify each report using functions invoked in the stack trace associated with the bug report. The problem is that typical bug repositories may have stack traces that contain tens of thousands of functions, which causes the curse of dimensionality problem. In this paper, we propose a feature extraction technique that reduces the feature size and yet retains the information that is most critical for the classification. The proposed feature extraction approach starts by abstracting stack traces of function calls into sequences of package names, by replacing each function with the package in which it is defined. We then segment these traces into multiple N-grams of variable length and map them to fixed-size sparse feature vectors, which are used to measure the distance between the stack trace of incoming bug report with a historical set of bug reports stack traces. The linear combination of stack trace similarity and non-textual fields such as component and severity are then used to measure the distance of a bug report with a historical set of bug reports. We show the effectiveness of our approach by applying it to the Eclipse bug repository that contains tens of thousands of bug reports. Our approach outperforms the approach that uses distinct function names, while significantly reducing the processing time.

机译：检测重复的错误报告可以帮助减少处理字段崩溃的处理时间。对于拥有庞大客户群的软件公司而言，这一点尤其重要，因为在该公司中，多个客户可以提交由相同故障引起的错误报告。有几种检测重复错误报告的技术。它们中的许多依赖于应用于从堆栈跟踪中提取的信息的某种分类技术。他们使用与错误报告关联的堆栈跟踪中调用的函数对每个报告进行分类。问题在于典型的错误存储库可能具有包含成千上万个函数的堆栈跟踪，这引起了维度问题的诅咒。在本文中，我们提出了一种特征提取技术，该技术可以减小特征尺寸，同时保留对于分类最关键的信息。所提出的特征提取方法是通过将每个函数替换为定义函数的程序包，将函数调用的堆栈轨迹抽象为程序包名称序列。然后，我们将这些迹线分割成多个N个可变长度的g-gram，并将它们映射到固定大小的稀疏特征向量，这些向量用于测量传入的错误报告的堆栈跟踪与错误记录堆栈跟踪的历史记录之间的距离。然后，使用堆栈跟踪相似度和非文本字段（例如组件和严重性）的线性组合来测量具有历史错误报告集的错误报告的距离。通过将其应用到包含成千上万个错误报告的Eclipse错误存储库中，我们展示了这种方法的有效性。我们的方法优于使用不同函数名称的方法，同时显着减少了处理时间。

著录项

来源
《IEEE International Conference on Software Quality, Reliability and Security》|2017年|240-250|共11页
会议地点
作者
Korosh Koochekian Sabor; Abdelwahab Hamou-Lhadj; Alf Larsson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computer bugs; Feature extraction; Dictionaries; Software; Reliability; Data mining;

机译：计算机错误;特征提取;词典;软件;可靠性;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Efficient feature extraction model for validation performance improvement of duplicate bug report detection in software bug triage systems [J] . Neysiani Behzad Soleimani, Babamir Seyed Morteza, Aritsugi Masayoshi Information and software technology . 2020,第Octa期

机译：软件BUG分类系统中验证性能提高验证性能提升的高效特征提取模型
2. Enhancements for duplication detection in bug reports with manifold correlation features [J] . Meng-Jie Lin, Cheng-Zen Yang, Chao-Yuan Lee, The Journal of Systems and Software . 2016,第nova期

机译：带有多重关联功能的错误报告中的重复检测增强功能
3. A Novel Technique for Duplicate Detection and Classification of Bug Reports [J] . Tao ZHANG, Byungjeong LEE IEICE transactions on information and systems . 2014,第7期

机译：错误报告的重复检测和分类的新技术
4. DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports [C] . Korosh Koochekian Sabor, Abdelwahab Hamou-Lhadj, Alf Larsson IEEE International Conference on Software Quality, Reliability and Security . 2017

机译：DURFEX：一种有效检测重复检测重复检测的特征提取技术
5. A contextual approach towards more accurate duplicate bug report detection. [D] . Alipour, Anahita. 2013

机译：一种用于更准确地检测重复错误报告的上下文方法。
6. An Efficient Feature Extraction Technique Based on Local Coding PSSMand Multifeatures Fusion for Predicting Protein-ProteinInteractions [O] . Ji-Yong An, Yong Zhou, Yu-Jun Zhao, 2019

机译：基于局部编码PSSM的有效特征提取技术和多特征融合预测蛋白互动互动
7. Improving Detection Performance of Duplicate Bug Reports Using Extended Centroid Features [O] . NHAN MINH PHUC 2014

机译：使用扩展的质心特征提高重复错误报告的检测性能
8. Improved Feature Extraction, Feature Selection, and Identification Techniques That Create a Fast Unsupervised Hyperspectral Target Detection Algorithm [R] . Johnson, R. J. 2008

机译：改进的特征提取，特征选择和识别技术，创建快速无监督的高光谱目标检测算法

DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports

摘要

著录项

相似文献

相关主题

期刊订阅