首页> 外文学位 >Revisiting the experimental design choices for approaches for the automated retrieval of duplicate issue reports
【24h】

Revisiting the experimental design choices for approaches for the automated retrieval of duplicate issue reports

机译:重新研究自动检索重复问题报告的方法的实验设计选择

获取原文
获取原文并翻译 | 示例

摘要

Issue tracking systems, such as Bugzilla, are commonly used to track reported bugs and change requests. Duplicate reports have been considered as a hindrance to developers and a drain on their resources. To avoid wasting developer resources on previously-reported (i.e., duplicate) issues, it is necessary to identify such duplicates as soon as they are reported. In recent years, several approaches have been proposed for the automated retrieval of duplicate reports. These approaches leverage the textual, categorical, and contextual information in previously reported issues to determine whether a newly-reported issue has been previously-reported. In general, studies that are designed to evaluate these approaches treat all the duplicate issue reports equally, make use of data chunks that span a relatively short period of time, and ignore the impact of newly-activated features (e.g., just-in-time lightweight retrieval of duplicates at filing time) in the recent issue tracking systems.;This thesis revisits the experimental design choices of such prior studies along three perspectives: 1) Used performance measures, 2) Evaluation process, and 3) Experiment's data choice. For the performance measures, we highlight the need for effort-aware evaluation of such approaches, since the identification of a considerable amount of duplicate reports (over 50%) appears to be a relatively trivial task.;For the evaluation process, we show that the previously-reported performance of such approaches is significantly overestimated.;Finally, recent versions of ITSs perform just-in-time lightweight retrieval of duplicate issue reports at the filing time of an issue report. The aim of such just-in-time retrieval is to avoid the filing of duplicates. We show that future studies of the automated retrieval of duplicate reports have to focus on after-JIT duplicates, as these duplicates are more representative of issue reports in practice nowadays.;Our results through this thesis highlight the current state of progress in the automated retrieval of duplicate reports while charting directions for future research efforts.
机译:问题跟踪系统(例如Bugzilla)通常用于跟踪报告的错误和更改请求。重复的报告被认为是阻碍开发人员并浪费其资源的原因。为避免开发人员资源浪费在先前报告的(即重复项)问题上,有必要在报告这些重复项后立即对其进行识别。近年来,已经提出了几种方法来自动检索重复的报告。这些方法利用先前报告的问题中的文本,类别和上下文信息来确定是否已先前报告了新报告的问题。一般而言,旨在评估这些方法的研究会平等地对待所有重复的问题报告,利用跨越相对较短时间段的数据块,并忽略新激活功能的影响(例如即时)在最近的问题跟踪系统中进行轻量级检索(在归档时进行重复操作)。;本文从三个角度重新审视了此类现有研究的实验设计选择:1)使用的性能指标; 2)评估过程; 3)实验的数据选择。对于绩效衡量指标,我们强调需要对此类方法进行有意识的评估,因为识别大量重复的报告(超过50%)似乎是相对琐碎的任务。最终,这种方法的先前报告的性能被高估了。最后,ITS的最新版本在问题报告提交时对重复的问题报告进行了及时的轻量级检索。这种及时检索的目的是避免重复记录。我们表明,未来对重复报告自动检索的研究必须集中在JIT重复之后,因为这些重复在当今实践中更能代表问题报告。;我们通过本论文得出的结果突出了自动检索的当前状态重复的报告,同时为将来的研究工作制定方向。

著录项

  • 作者

    Rakha, Mohamed Sami.;

  • 作者单位

    Queen's University (Canada).;

  • 授予单位 Queen's University (Canada).;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 179 p.
  • 总页数 179
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号