...
首页> 外文期刊>Information and software technology >Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation
【24h】

Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation

机译:通过半监督的潜在Dirichlet分配对软件变更消息进行自动分类

获取原文
获取原文并翻译 | 示例

摘要

Context: Topic models such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (IDA) have demonstrated success in mining software repository tasks. Understanding software change messages described by the unstructured nature-language text is one of the fundamental challenges in mining these messages in repositories.Objective: We seek to present a novel automatic change message classification method characterized by semi-supervised topic semantic analysis.Method: In this work, we present a semi-supervised LDA based approach to automatically classify change messages. We use domain knowledge of software changes to make labeled samples which are added to build the semi-supervised LDA model. Next, we verify the cross-project analysis application of our method on three open-source projects. Our method has two advantages over existing software change classification methods: First of all, it mitigates the issue of how to set the appropriate number of latent topics. We do not have to choose the number of latent topics in our method, because it corresponds to the number of class labels. Second, this approach utilizes the information provided by the label samples in the training set.Results.- Our method automatically classified about 85% of the change messages in our experiment and our validation survey showed that 70.56% of the time our automatic classification results were in agreement with developer opinions.Conclusion: Our approach automatically classifies most of the change messages which record the cause of the software change and the method is applicable to cross-project analysis of software change messages.
机译:背景:主题模型(如概率潜在语义分析(pLSA)和潜在狄利克雷分配(IDA))已证明在挖掘软件存储库任务中取得了成功。了解非结构化自然语言文本描述的软件变更消息是在存储库中挖掘这些消息的基本挑战之一。目的:我们寻求提出一种新颖的,具有半监督主题语义分析特征的自动变更消息分类方法。在这项工作中,我们提出了一种基于半监督LDA的方法来自动对变更消息进行分类。我们使用软件更改领域的知识来制作带标签的样本,然后将其添加以构建半监督的LDA模型。接下来,我们在三个开源项目中验证我们方法的跨项目分析应用。与现有的软件变更分类方法相比,我们的方法有两个优点:首先,它减轻了如何设置适当数量的潜在主题的问题。我们不必在我们的方法中选择潜在主题的数量,因为它对应于类标签的数量。其次,这种方法利用了训练集中的标签样本所提供的信息。结果-我们的方法在实验中自动对约85%的变更消息进行了分类,而我们的验证调查表明,我们的自动分类结果的时间为70.56%结论:我们的方法自动对大多数更改消息进行分类,这些更改消息记录了软件更改的原因,该方法适用于软件更改消息的跨项目分析。

著录项

  • 来源
    《Information and software technology 》 |2015年第1期| 369-377| 共9页
  • 作者单位

    School of Software Engineering, Chongqing University, Chongqing 401331, PR China;

    School of Software Engineering, Chongqing University, Chongqing 401331, PR China;

    Key Laboratory of Dependable Service Computing in Cyber Physical Society Ministry of Education, Chongqing 400044, PR China, School of Software Engineering, Chongqing University, Huxi Town, Shapingba, Chongqing 401331, PR China;

    School of Software Engineering, Chongqing University, Chongqing 401331, PR China;

    School of Software Engineering, Chongqing University, Chongqing 401331, PR China;

    School of Software Engineering, Chongqing University, Chongqing 401331, PR China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Software repositories mining; Semi-supervised topic modeling; LDA; Change message;

    机译:软件库挖掘;半监督主题建模;LDA;变更讯息;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号