Ctcompare: Code clone detection using hashed token sequences

机译：Ctcompare：使用哈希令牌序列进行代码克隆检测

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is much research on the use of tokenized source code to find code clones both within and between trees of source code. Some approaches have used suffix trees [1], [3]; others have used variations of longest common substring algorithms [4], [5]. This paper outlines an algorithm, embodied in a new tool called ctcompare, that takes a different tokenization approach. Each code base to be compared is first lexically analysed to produce a sequence of tokens. These are then broken into overlapping tuples of N consecutive tokens. The tuples are then hashed and the hash values of token tuples are used to identify type-1 and type-2 clone pairs. Hashed token sequences combined with a database have already been used in earlier ctcompare versions and elsewhere [2], but with a significant performance penalty due to database insertions. The benefits of this approach over the existing research include the simultaneous comparison of multiple large code bases and fast absolute performance.

机译：关于使用标记化源代码来查找源代码树内和源代码树之间的代码克隆的研究很多。一些方法使用了后缀树[1]，[3]。其他人使用最长的通用子串算法的变体[4]，[5]。本文概述了一种算法，该算法包含在称为ctcompare的新工具中，该算法采用了不同的标记化方法。首先对每个要比较的代码库进行词法分析，以生成一系列令牌。然后将它们分解为N个连续令牌的重叠元组。然后，对元组进行哈希处理，并使用令牌元组的哈希值来标识1型和2型克隆对。哈希令牌序列与数据库结合已在早期ctcompare版本和其他地方使用[2]，但由于插入数据库而导致性能显着下降。这种方法相对于现有研究的好处包括同时比较多个大型代码库和快速的绝对性能。

著录项

来源
《Software Clones (IWSC), 2012 6th International Workshop on》|2012年|p.92- 93|共2页
会议地点 Zurich(CH)
作者
Toomey Warren;
展开▼
作者单位

School of IT, Bond University, Robina, Qld. Australia;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词

相似文献

外文文献
中文文献
专利

1. CCFinder: a multilinguistic token-based code clone detection system for large scale source code [J] . Kamiya T., Kusumoto S., Inoue K. IEEE Transactions on Software Engineering . 2002,第7期

机译：CCFinder：基于多语言令牌的代码克隆检测系统，用于大规模源代码
2. Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods [J] . Ryota Ami, Hirohide Haga Journal of Software Engineering and Applications . 2017,第13期

机译：基于树和令牌的结合的代码克隆检测方法
3. A token-based code clone detection technique and its evaluation [J] . Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue 電子情報通信学会技術研究報告. ソフトウェアサイエンス. Software Science . 2000,第570期

机译：基于令牌的代码克隆检测技术及其评估
4. Ctcompare: Code clone detection using hashed token sequences [C] . Toomey Warren International Workshop on Software Clones . 2012

机译：CTCompare：代码克隆检测使用散列令牌序列
5. Coding videophone sequences at a better perceptual quality by using face detection and MPEG coding [D] . Cascante, Eduardo Cardoce 2006

机译：通过使用面部检测和MPEG编码以更好的感知质量编码视频电话序列
6. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics [O] . Marlon Stoeckius, Shiwei Zheng, Brian Houck-Loomis, 2018

机译：使用条形码抗体进行细胞哈希处理可实现单细胞基因组学的多重检测和双重检测
7. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code [O] . Inoue Katsuro 2015

机译：CCFinder：一种用于大规模源代码的基于多语言令牌的代码克隆检测系统

Ctcompare: Code clone detection using hashed token sequences

摘要

著录项

相似文献

相关主题

期刊订阅