首页> 外文会议>IEEE International Workshop on Software Clones >A picture is worth a thousand words: Code clone detection based on image similarity
【24h】

A picture is worth a thousand words: Code clone detection based on image similarity

机译:一张图片胜过千言万语:基于图像相似性的代码克隆检测

获取原文

摘要

This paper introduces a new code clone detection technique based on image similarity. The technique captures visual perception of code seen by humans in an IDE by applying syntax highlighting and images conversion on raw source code text. We compared two similarity measures, Jaccard and earth mover's distance (EMD) for our image-based code clone detection technique. Jaccard similarity offered better detection performance than EMD. The F1 score of our technique on detecting Java clones with pervasive code modifications is comparable to five well-known code clone detectors: CCFinderX, Deckard, iClones, NiCad, and Simian. A Gaussian blur filter is chosen as a normalisation technique for type-2 and type-3 clones. We found that blurring code images before similarity computation resulted in higher precision and recall. The detection performance after including the blur filter increased by 1 to 6 percent. The manual investigation of clone pairs in three software systems revealed that our technique, while it missed some of the true clones, could also detect additional true clone pairs missed by NiCad.
机译:本文介绍了一种基于图像相似性的新码克隆检测技术。该技术通过在原始源代码文本上应用语法突出显示和图像转换,捕获IDE中人类所看到的代码的视觉感知。我们将两种相似度措施,Jaccard和地球移动器的距离(EMD)进行了比较了基于图像的代码克隆检测技术。 Jaccard相似性提供了比EMD更好的检测性能。我们在检测普遍代码修改的Java克隆的技术的F1分数与五种众所周知的码克隆探测器相当:CCFINDEX,DECKARD,ICLONES,NICAD和SIMIAN。选择高斯模糊滤波器作为2型和类型3克隆的归一化技术。我们发现在相似性计算之前模糊的代码图像导致更高的精度和召回。包括模糊过滤器后的检测性能增加1%至6%。三种软件系统中克隆对的手工调查揭示了我们的技术,而它错过了一些真正的克隆,也可以检测NiCad错过的额外真正的克隆对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号