首页> 外文学位 >Information theoretic evaluation of change prediction models for large-scale software.
【24h】

Information theoretic evaluation of change prediction models for large-scale software.

机译:大型软件变更预测模型的信息理论评估。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis, we first analyze the information generated during the development process, which can be obtained through mining the software repositories. We observe that the change data follows a Zipf distribution and exhibits self-similarity. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. One purpose of creating these models is to rank the files of the software that are most susceptible to having faults.; The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events i.e., changes or bugs that occur in to each file, and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED), in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The result of a new bug occurring to that file is a new exponential effect added to the first one. The third model is called RED Co-Changes (REDCC). With each modification to a given file, the REDCC model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes.; We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the REDCC model predicts the distribution that is closest to the actual distribution for all the studied systems. (Abstract shortened by UMI.)
机译:在本文中,我们首先分析在开发过程中生成的信息,这些信息可以通过挖掘软件存储库获得。我们观察到变化数据遵循Zipf分布并表现出自相似性。基于提取的数据,我们然后开发三个概率模型来预测哪些文件将具有更改或错误。创建这些模型的目的之一是对最容易出现故障的软件文件进行排名。第一个模型是最大似然估计(MLE),它仅计算事件数,即每个文件中发生的更改或错误,并对计数进行归一化以计算概率分布。第二种模型是自反指数衰减(RED),在该模型中,我们假设文件中的修改预测率随对该文件的任何修改而增加,并呈指数衰减。该文件发生新错误的结果是第一个添加了新的指数效应。第三种模型称为RED共同变更(REDCC)。通过对给定文件的每次修改,REDCC模型不仅会增加其预测率,而且还会通过先前的共更改来增加与给定文件相关的其他文件的率。然后,我们提出一种信息理论方法来评估不同预测模型的性能。在这种方法中,使用交叉熵来测量模型分布与系统实际未知概率分布的接近度。我们使用建议的信息理论方法对六个大型开源系统进行经验评估,以评估我们的预测模型。基于此评估,我们观察到三个预测模型中的REDCC模型预测了所有研究系统中最接近实际分布的分布。 (摘要由UMI缩短。)

著录项

  • 作者

    Askari, Mina.;

  • 作者单位

    University of Waterloo (Canada).;

  • 授予单位 University of Waterloo (Canada).;
  • 学科 Computer Science.
  • 学位 M.Math.
  • 年度 2006
  • 页码 112 p.
  • 总页数 112
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号