【24h】

A Dataset of Duplicate Pull-Requests in GitHub

机译:GitHub中重复请求请求的数据集

获取原文

摘要

In GitHub, the pull-based development model enables community contributors to collaborate in a more efficient way. However, the distributed and parallel characteristics of this model pose a potential risk for developers to submit duplicate pull-requests (PRs), which increase the extra cost of project maintenance. To facilitate the further studies to better understand and solve the issues introduced by duplicate PRs, we construct a large dataset of historical duplicate PRs extracted from 26 popular open source projects in GitHub by using a semi-automatic approach. Furthermore, we present some preliminary applications to illustrate how further researches can be conducted based on this dataset.
机译:在GitHub中,基于拉式的开发模型使社区贡献者能够以更有效的方式进行协作。但是,此模型的分布式和并行特性给开发人员提交重复的拉取请求(PR)带来了潜在的风险,这增加了项目维护的额外成本。为了促进进一步的研究以更好地理解和解决重复PR引入的问题,我们使用半自动方法构建了一个庞大的历史重复PR数据集,该历史PR从GitHub中的26个流行开源项目中提取。此外,我们提供了一些初步的应用程序,以说明如何基于此数据集进行进一步的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号