首页> 外文会议> >Generated code in studies on clone rates
【24h】

Generated code in studies on clone rates

机译:克隆率研究中生成的代码

获取原文

摘要

Various earlier studies have measured clone rates for diverse projects. One of the reasons for exceptionally high clone rates for individual source files was found to be auto-generated code. Automatically generated code is generally not maintained and, hence, should be excluded from clone-rate measurements. This kind of code might even introduce a bias to clone rates of projects when there is a large amount of generated code and clone rates for generated files generally deviate from the average clone rate for handwritten code. While some generated files stuck out with clone rates above the average in earlier studies, we do not know whether this is generally the case and how much code is actually generated automatically. This paper investigates the amount of generated files in projects, whether clone rates for generated files really differ from handwritten code, and - overall - whether generated code in fact introduces a bias to clone rates. We heuristically detect generated files in a very large open-source project corpus of programs written in C, C++, C#, or Java and report the number of projects with generated code. For these projects, we compare clone rates of generated and handwritten files. Our results show higher clone rates for generated files. Moreover, when we aggregate clone rates from files to projects, the clone rates of projects with at least one generated file are also slightly higher than in projects for which no generated files were detected. Our results suggest that researchers should indeed take special care to exclude generated code in studies on clone rates.
机译:各种早期研究已经测量了不同项目的克隆率。发现单个源文件的克隆率极高的原因之一是自动生成的代码。通常不维护自动生成的代码,因此应将其从克隆速率测量中排除。当存在大量生成的代码,并且生成的文件的克隆速率通常与手写代码的平均克隆速率有所不同时,这种代码甚至可能会给项目的克隆速率带来偏差。虽然某些生成的文件的克隆速率高于早期研究的平均水平,但我们不知道是否通常是这种情况,以及实际上自动生成了多少代码。本文研究了项目中已生成文件的数量,已生成文件的克隆速率是否确实不同于手写代码,以及总体而言,所生成的代码是否确实对克隆速率产生了偏见。我们试探性地检测使用C,C ++,C#或Java编写的大型开源项目程序集中的生成文件,并报告带有生成代码的项目数。对于这些项目,我们比较生成的文件和手写文件的克隆率。我们的结果表明,生成文件的克隆率更高。此外,当我们将文件的克隆率汇总到项目时,具有至少一个生成文件的项目的克隆率也比未检测到生成文件的项目的克隆率略高。我们的结果表明,研究人员的确应该特别注意在克隆率研究中排除生成的代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号