首页> 外文期刊>Information and software technology >Data stream mining for predicting software build outcomes using source code metrics
【24h】

Data stream mining for predicting software build outcomes using source code metrics

机译:数据流挖掘,可使用源代码指标来预测软件构建结果

获取原文
获取原文并翻译 | 示例
           

摘要

Context: Software development projects involve the use of a wide range of tools to produce a software artifact. Software repositories such as source control systems have become a focus for emergent research because they are a source of rich information regarding software development projects. The mining of such repositories is becoming increasingly common with a view to gaining a deeper understanding of the development process. Objective: This paper explores the concepts of representing a software development project as a process that results in the creation of a data stream. It also describes the extraction of metrics from the Jazz repository and the application of data stream mining techniques to identify useful metrics for predicting build success or failure. Method: This research is a systematic study using the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift by applying the Massive Online Analysis (MOA) tool. Results: The results indicate that only a relatively small number of the available measures considered have any significance for predicting the outcome of a build over time. These significant measures are identified and the implication of the results discussed, particularly the relative difficulty of being able to predict failed builds. The Hoeffding Tree approach is shown to produce a more stable and robust model than traditional data mining approaches. Conclusion: Overall prediction accuracies of 75% have been achieved through the use of the Hoeffding Tree classification method. Despite this high overall accuracy, there is greater difficulty in predicting failure than success. The emergence of a stable classification tree is limited by the lack of data but overall the approach shows promise in terms of informing software development activities in order to minimize the chance of failure.
机译:背景:软件开发项目涉及使用各种工具来产生软件工件。诸如源代码控制系统之类的软件存储库已成为新兴研究的重点,因为它们是有关软件开发项目的丰富信息的来源。为了更深入地了解开发过程,对此类存储库的挖掘正变得越来越普遍。目标:本文探讨了将软件开发项目表示为导致创建数据流的过程的概念。它还描述了从Jazz存储库中提取指标以及数据流挖掘技术的应用,以识别用于预测构建成功或失败的有用指标。方法:本研究是一项系统研究,使用Hoeffding树分类方法与自适应滑动窗口(ADWIN)方法结合使用大规模在线分析(MOA)工具检测概念漂移。结果:结果表明,考虑到的可用措施中,只有相对少数几个对预测构建结果随时间推移具有任何意义。确定了这些重要措施并讨论了结果的含义,尤其是能够预测失败构建的相对难度。与传统的数据挖掘方法相比,Hoeffding Tree方法显示出可以产生更稳定,更可靠的模型。结论:通过使用霍夫丁树分类方法,总体预测准确率达到了75%。尽管总体准确性很高,但预测失败要比成功要困难得多。稳定的分类树的出现受到数据缺乏的限制,但总体而言,该方法在通知软件开发活动以最大程度地降低失败机会方面显示出了希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号