首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering
【24h】

Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering

机译:培训一对一免费:部分受监督的神经网络,用于错误报告重复检测和聚类

获取原文

摘要

Tracking user reported bugs requires considerable engineering effort in going through many repetitive reports and assigning them to the correct teams. This paper proposes a neural architecture that can jointly (1) detect if two bug reports are duplicates, and (2) aggregate them into latent topics. Leveraging the assumption that learning the topic of a bug is a sub-task for detecting duplicates, we design a loss function that can jointly perform both tasks but needs supervision for only duplicate classification, achieving topic clustering in an unsupervised fashion. We use a two-step attention module that uses self-attention for topic clustering and conditional attention for duplicate detection. We study the characteristics of two types of real world datasets that have been marked for duplicate bugs by engineers and by nontechnical annotators. The results demonstrate that our model not only can outperform state-of-the-art methods for duplicate classification on both cases, but can also learn meaningful latent clusters without additional supervision.
机译:跟踪用户报告的错误需要花费大量的工程工作来遍历许多重复的报告,并将它们分配给正确的团队。本文提出了一种神经体系结构,它可以联合(1)检测两个错误报告是否重复,以及(2)将它们汇总为潜在主题。利用假设学习错误主题是检测重复项的子任务的假设,我们设计了一个损失函数,可以共同执行这两项任务,但只需要对重复项分类进行监督,以无监督的方式实现了主题聚类。我们使用一个分为两步的注意力模块,该模块使用自我注意力进行主题聚类,使用条件注意力进行重复检测。我们研究了两种类型的现实世界数据集的特征,这些特征已被工程师和非技术注释者标记为重复错误。结果表明,我们的模型不仅可以在两种情况下都优于最新的重复分类方法,而且可以在无需额外监督的情况下学习有意义的潜在聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号