Mining Software Repositories for Accurate Authorship

机译：挖掘软件存储库准确作者

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Code authorship information is important for analyzing software quality, performing software forensics, and improving software maintenance. However, current tools assume that the last developer to change a line of code is its author regardless of all earlier changes. This approximation loses important information. We present two new line-level authorship models to overcome this limitation. We first define the repository graph as a graph abstraction for a code repository, in which nodes are the commits and edges represent the development dependencies. Then for each line of code, structural authorship is defined as a subgraph of the repository graph recording all commits that changed the line and the development dependencies between the commits; weighted authorship is defined as a vector of author contribution weights derived from the structural authorship of the line and based on a code change measure between commits, for example, best edit distance. We have implemented our two authorship models as a new git built-in tool git-author. We evaluated git-author in an empirical study and a comparison study. In the empirical study, we ran git-author on five open source projects and found that git-author can recover more information than a current tool (git-blame) for about 10% of lines. In the comparison study, we used git-author to build a line-level model for bug prediction. We compared our line-level model with a representative file-level model. The results show that our line-level model performs consistently better than the file-level model when evaluated on our data sets produced from the Apache HTTP server project.

机译：代码作者信息对于分析软件质量，执行软件取证以及改进软件维护非常重要。但是，当前工具假定最后一个开发人员更改代码行是其作者，无论所有早期的更改如何。此近似值失去了重要信息。我们展示了两个新的线路级别作者模型来克服这种限制。我们首先将存储库图定义为代码存储库的图形抽象，其中节点是提交和边缘代表开发依赖项。然后，对于每行代码，结构作者身份被定义为存储库图表的子图，记录所有更改线路和提交之间的开发依赖性的所有提交;加权作者被定义为来自线路结构作者的作者贡献权重的传染媒介，并且基于提交之间的代码变更测量，例如，最佳编辑距离。我们已经实施了我们的两个作者模型，作为一个新的Git内置工具git-author。我们在实证研究中评估了Git-Author和比较研究。在实证研究中，我们在五个开源项目上运行了Git作者，发现Git-Author可以恢复更多信息，而不是当前工具（Git-Clacitic）约10％的行。在比较研究中，我们使用Git-Author来构建Bug预测的线路级模型。我们将我们的线路级模型与代表性的文件级模型进行了比较。结果表明，当在Apache HTTP服务器项目生成的数据集时，我们的线路级模型比文件级模型始终如一。

著录项

来源
《IEEE International Conference on Software Maintenance》|2013年||共10页
会议地点
作者
Xiaozhu Meng; Barton P. Miller; William R. Williams; Andrew R. Bernat;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.5-53;
关键词

相似文献

外文文献
中文文献
专利

1. Selecting best predictors from large software repositories for highly accurate software effort estimation [J] . Sidra Tariq, Muhammad Usman, Alvis C.M. Fong Journal of software maintenance and evolution rsearch and practice . 2020,第10期

机译：从大型软件存储库中选择最佳预测器，以获得高精度的软件努力估算
2. Mining software repositories for empirical validation of laws of software evolution for Java projects [J] . Arvinder Kaur, Vidhi Vig International journal of computational systems engineering . 2016,第3期

机译：挖掘软件存储库以对Java项目的软件演化定律进行经验验证
3. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks [J] . Sun Xiaobing, Li Bixin, Leung Hareton, Information and software technology . 2015,第Octa期

机译：MSR4SM：使用主题模型来有效地挖掘软件存储库用于软件维护任务
4. Mining Software Repositories for Accurate Authorship [C] . Meng Xiaozhu, Miller Barton P., Williams William R., 2013 29th IEEE International Conference on Software Maintenance . 2013

机译：挖掘软件存储库以获得准确的作者身份
5. Techniques for improving software development processes by mining software repositories [D] . Dhaliwal, Tejinder 2012

机译：通过挖掘软件存储库来改善软件开发流程的技术
6. Authorship Issues Related to Software Tools [O] . Randolph A. Miller 2007

机译：与软件工具有关的著作权问题
7. Mining Software Repositories with a Collaborative Heuristic Repository [O] . Hlib Babii, Julian Aron Prenner, Laurin Stricker, 2021

机译：使用协作启发式存储库的挖掘软件存储库

Mining Software Repositories for Accurate Authorship

摘要

著录项

相似文献

相关主题

期刊订阅