首页> 外文学位 >Improved methods for mining software repositories to detect evolutionary couplings.
【24h】

Improved methods for mining software repositories to detect evolutionary couplings.

机译:用于挖掘软件存储库以检测进化耦合的改进方法。

获取原文
获取原文并翻译 | 示例

摘要

The dissertation investigates techniques to improve the results of mining large software repositories for evolutionary couplings. Evolutionary couplings are dependencies between software artifacts that impact how systems are maintained and evolved. They appear as co-changing artifacts in the version history of software systems. Detection of evolutionary couplings has traditionally been done using data mining techniques (e.g., frequent pattern mining). The focus of this research is twofold.;First, develop new techniques for detecting evolutionary couplings by employing orthogonal information derived using program analysis techniques (e.g., metrics). The goal is to reduce the number of false positives to improve the quality of detection. These new hybrid approaches include the use of statically derived metrics, repository meta-data (i.e., age and distance), and different transaction size. Of the four approaches examined, three produce fewer false positives and higher quality patterns over using the traditional approach to compute evolutionary coupling. The best approach had a precision of 90% in some cases. Slight improvements to recall were also observed. Distance appears to have no clear effect on filtering out false patterns.;Second, an empirical investigation of the impact that the parameters of the data mining techniques have on the detection of evolutionary couplings is undertaken. This provides fundamental evidence for the selection of data mining parameters in the context of software repositories. The parameters studied include a comparison of different transaction sizes. Additionally, a regression prediction model on minimum support, confidence, duration, and training size was constructed to uncover the effects on the generated patterns and association rules. Different transaction sizes have an effect on the quality of the generated association rules. It was found that larger time windows have better prediction accuracies and completeness. The week time window gave the best results and the time window of an individual commit gave the worst results. Additionally, a large-scale study of frequent pattern mining parameters showed that the regression models are able to accurately predicate outcomes. The confidence parameter is the most dominant on the final outcome results.
机译:本文研究了改进挖掘大型软件仓库以进行进化耦合的结果的技术。演化耦合是软件工件之间的依存关系,会影响系统的维护和发展方式。它们在软件系统的版本历史中显示为共同变化的工件。传统上已经使用数据挖掘技术(例如,频繁模式挖掘)来完成对进化耦合的检测。这项研究的重点有两个方面:首先,通过利用程序分析技术(例如度量)得出的正交信息,开发用于检测进化耦合的新技术。目的是减少误报的数量,以提高检测质量。这些新的混合方法包括使用静态得出的指标,存储库元数据(即年龄和距离)以及不同的交易规模。在所研究的四种方法中,与使用传统方法计算进化耦合相比,三种方法产生的假阳性更少,质量模式更高。最好的方法在某些情况下具有90%的精度。还观察到召回率略有改善。距离似乎对滤除错误模式没有明显影响。其次,对数据挖掘技术的参数对进化耦合检测的影响进行了实证研究。这为在软件存储库中选择数据挖掘参数提供了基础证据。研究的参数包括不同交易规模的比较。另外,构建了有关最小支持,置信度,持续时间和训练规模的回归预测模型,以揭示对生成的模式和关联规则的影响。不同的事务大小会影响所生成关联规则的质量。发现较大的时间窗口具有更好的预测准确性和完整性。周时间窗给出了最好的结果,而单个提交的时间窗给出了最差的结果。此外,对频繁模式挖掘参数的大规模研究表明,回归模型能够准确预测结果。置信度参数是最终结果中最主要的参数。

著录项

  • 作者

    Alali, Abdulkareem.;

  • 作者单位

    Kent State University.;

  • 授予单位 Kent State University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 266 p.
  • 总页数 266
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号