首页> 外文会议>IEEE 14th International Symposium on High-Assurance Systems Engineering. >Automated Duplicate Bug Report Classification Using Subsequence Matching
【24h】

Automated Duplicate Bug Report Classification Using Subsequence Matching

机译:使用子序列匹配的自动重复错误报告分类

获取原文
获取原文并翻译 | 示例

摘要

The use of open bug tracking repositories like Bugzilla is common in many software applications. They allow developers, testers and users the ability to report problems associated with the system and track resolution status. Open and democratic reporting tools, however, face one major challenge: users can, and often do, submit reports describing the same problem. Research in duplicate report detection has primarily focused on word frequency based similarity measures paying little regard to the context or structure of the reporting language. Thus, in large repositories, reports describing different issues may be marked as duplicates due to the frequent use of common words. In this paper, we present Factor LCS, a methodology which utilizes common sequence matching for duplicate report detection. We demonstrate the approach by analyzing the complete Fire fox bug repository up until March 2012 as well as a smaller subset of Eclipse dataset from January 1, 2008 to December 31, 2008. We achieve a duplicate recall rate above 70% with Fire fox, which exceeds the results reported on smaller subsets of the same repository.
机译:在许多软件应用程序中,都经常使用像Bugzilla这样的开放式错误跟踪存储库。它们使开发人员,测试人员和用户能够报告与系统相关的问题并跟踪解决状态。但是,开放和民主的报告工具面临一个重大挑战:用户可以并且经常确实提交描述相同问题的报告。重复报告检测的研究主要集中在基于词频的相似性度量上,而很少考虑报告语言的上下文或结构。因此,在大型存储库中,由于经常使用常用词,描述不同问题的报告可能会被标记为重复项。在本文中,我们提出了因子LCS,一种利用通用序列匹配进行重复报告检测的方法。我们通过分析直到2012年3月的完整Fire fox错误存储库以及从2008年1月1日至2008年12月31日的Eclipse数据集的较小子集来演示该方法。使用Fire fox,我们可以将重复调用率提高到70%以上。超出了在同一存储库的较小子集上报告的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号