首页> 外文会议>ACM / IEEE International Symposium on Empirical Software Engineering and Measurement >Where Is the Road for Issue Reports Classification Based on Text Mining?
【24h】

Where Is the Road for Issue Reports Classification Based on Text Mining?

机译:基于文本挖掘的发布报告分类之路在哪里?

获取原文

摘要

Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed and structured data (e.g., priority, severity, software version and so on) are difficult to adopt. In this paper, issue classification approaches on a large-scale dataset, including 80 popular projects and over 252,000 issue reports collected from GitHub, were investigated. First, four traditional text-based classification methods and their performances were discussed. Semantic perplexity (i.e., an issues description confuses bug-related sentences with nonbug-related sentences) is a crucial factor that affects the classification performances based on quantitative and qualitative study. Finally, A two-stage classifier framework based on the novel metrics of semantic perplexity of issue reports was designed. Results show that our two-stage classification can significantly improve issue classification performances.
机译:当前,由于GitHub中问题跟踪系统(ITS)的高度开放性,开源项目每天都会收到各种问题。对于项目经理而言,ITS是一项非常耗时且费力的问题归类任务。但是,仅需要贡献者简短的文本摘要即可报告GitHub中的问题。因此,难以采用基于详细和结构化数据(例如,优先级,严重性,软件版本等)的大多数传统分类方法。在本文中,研究了大规模数据集上的问题分类方法,包括80个受欢迎的项目和从GitHub收集的超过252,000个问题报告。首先,讨论了四种传统的基于文本的分类方法及其性能。语义上的困惑(即问题描述使与错误相关的句子与与错误无关的句子混淆)是影响基于定量和定性研究的分类性能的关键因素。最后,设计了一个基于问题报告语义复杂度新指标的两阶段分类器框架。结果表明,我们的两阶段分类可以显着提高问题分类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号