首页> 外文会议>IEEE International Conference on Software Maintenance and Evolution >CCLearner: A Deep Learning-Based Clone Detection Approach
【24h】

CCLearner: A Deep Learning-Based Clone Detection Approach

机译:CCleaner:基于深度学习的克隆检测方法

获取原文

摘要

Programmers produce code clones when developing software. By copying and pasting code with or without modification, developers reuse existing code to improve programming productivity. However, code clones present challenges to software maintenance: they may require consistent application of the same or similar bug fixes or program changes to multiple code locations. To simplify the maintenance process, various tools have been proposed to automatically detect clones [1], [2], [3], [4], [5], [6]. Some tools tokenize source code, and then compare the sequence or frequency of tokens to reveal clones [1], [3], [4], [5]. Some other tools detect clones using tree-matching algorithms to compare the Abstract Syntax Trees (ASTs) of source code [2], [6]. In this paper, we present CCLEARNER, the first solely token-based clone detection approach leveraging deep learning. CCLEARNER extracts tokens from known method-level code clones and nonclones to train a classifier, and then uses the classifier to detect clones in a given codebase. To evaluate CCLEARNER, we reused BigCloneBench [7], an existing large benchmark of real clones. We used part of the benchmark for training and the other part for testing, and observed that CCLEARNER effectively detected clones. With the same data set, we conducted the first systematic comparison experiment between CCLEARNER and three popular clone detection tools. Compared with the approaches not using deep learning, CCLEARNER achieved competitive clone detection effectiveness with low time cost.
机译:程序员在开发软件时生成代码克隆。通过使用或不修改的复制和粘贴代码,开发人员重用现有代码以提高编程生产力。但是,代码克隆对软件维护的挑战存在挑战:它们可能需要一致地应用相同或类似的错误修复或程序更改到多个代码位置。为了简化维护过程,已经提出了各种工具来自动检测克隆[1],[2],[3],[4],[5],[6]。一些工具授权源代码,然后比较令牌的序列或频率,以显示克隆[1],[3],[4],[5]。一些其他工具使用树匹配算法检测克隆以比较源代码的抽象语法树(AST)[2],[6]。在本文中,我们提出了Cetclectner,这是一种基于令牌的克隆检测方法利用深度学习。 Cetclearner从已知的方法级代码克隆和非转换器中提取令牌以培训分类器,然后使用分类器检测给定代码库中的克隆。为了评估Cclearner,我们重复使用BigClonebench [7],是真正的克隆的现有大基准。我们使用部分基准进行培训和其他部件进行测试,并观察到Cetclearner有效地检测到克隆。通过相同的数据集,我们在Cetclearner和三个流行的克隆检测工具之间进行了第一个系统比较实验。与不使用深层学习的方法相比,Cetclectner以低时间成本实现了竞争性克隆检测效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号