首页> 外文会议>IEEE International Conference on Software Maintenance >Mining Software Repositories for Accurate Authorship
【24h】

Mining Software Repositories for Accurate Authorship

机译:挖掘软件存储库准确作者

获取原文

摘要

Code authorship information is important for analyzing software quality, performing software forensics, and improving software maintenance. However, current tools assume that the last developer to change a line of code is its author regardless of all earlier changes. This approximation loses important information. We present two new line-level authorship models to overcome this limitation. We first define the repository graph as a graph abstraction for a code repository, in which nodes are the commits and edges represent the development dependencies. Then for each line of code, structural authorship is defined as a subgraph of the repository graph recording all commits that changed the line and the development dependencies between the commits; weighted authorship is defined as a vector of author contribution weights derived from the structural authorship of the line and based on a code change measure between commits, for example, best edit distance. We have implemented our two authorship models as a new git built-in tool git-author. We evaluated git-author in an empirical study and a comparison study. In the empirical study, we ran git-author on five open source projects and found that git-author can recover more information than a current tool (git-blame) for about 10% of lines. In the comparison study, we used git-author to build a line-level model for bug prediction. We compared our line-level model with a representative file-level model. The results show that our line-level model performs consistently better than the file-level model when evaluated on our data sets produced from the Apache HTTP server project.
机译:代码作者信息对于分析软件质量,执行软件取证以及改进软件维护非常重要。但是,当前工具假定最后一个开发人员更改代码行是其作者,无论所有早期的更改如何。此近似值失去了重要信息。我们展示了两个新的线路级别作者模型来克服这种限制。我们首先将存储库图定义为代码存储库的图形抽象,其中节点是提交和边缘代表开发依赖项。然后,对于每行代码,结构作者身份被定义为存储库图表的子图,记录所有更改线路和提交之间的开发依赖性的所有提交;加权作者被定义为来自线路结构作者的作者贡献权重的传染媒介,并且基于提交之间的代码变更测量,例如,最佳编辑距离。我们已经实施了我们的两个作者模型,作为一个新的Git内置工具git-author。我们在实证研究中评估了Git-Author和比较研究。在实证研究中,我们在五个开源项目上运行了Git作者,发现Git-Author可以恢复更多信息,而不是当前工具(Git-Clacitic)约10%的行。在比较研究中,我们使用Git-Author来构建Bug预测的线路级模型。我们将我们的线路级模型与代表性的文件级模型进行了比较。结果表明,当在Apache HTTP服务器项目生成的数据集时,我们的线路级模型比文件级模型始终如一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号