首页> 外文期刊>Journal of Software Maintenance and Evolution >Large-scale inter-system clone detection using suffix trees and hashing
【24h】

Large-scale inter-system clone detection using suffix trees and hashing

机译:使用后缀树和哈希的大规模系统间克隆检测

获取原文
获取原文并翻译 | 示例
           

摘要

Detecting a similar code between two systems has various applications such as comparing two software variants or versions or finding potential license violations. Techniques detecting suspiciously similar code must scale in terms of resources needed to very large code corpora and need to have high precision because a human needs to inspect the results. This paper demonstrates how suffix trees can be used to obtain a scalable comparison. The evaluation is carried out for very large code corpora. Our evaluation shows that our approach is faster than index-based techniques when the analysis is run only once. If the analysis is to be conducted multiple times, creating an index pays off. We report how much code can be filtered out from the analysis using an index-based filter. In addition to that, this paper proposes a method to improve precision through user feedback. A user validates a sample of the found clone candidates. An automated data mining technique learns a decision tree on the basis of the user decisions and different code metrics. We investigate the relevance of several metrics and whether criteria learned from one application domain can be generalized to other domains. Copyright © 2013 John Wiley & Sons, Ltd.
机译:在两个系统之间检测相似的代码具有各种应用程序,例如比较两个软件变体或版本或查找潜在的许可证冲突。检测可疑相似代码的技术必须根据超大型代码库所需的资源进行扩展,并且由于人们需要检查结果,因此必须具有较高的精度。本文演示了如何使用后缀树来获得可扩展的比较。评估是针对非常大的代码集进行的。我们的评估表明,当分析只运行一次时,我们的方法比基于索引的技术快。如果要进行多次分析,则创建索引会有所作为。我们报告使用基于索引的过滤器可以从分析中过滤掉多少代码。除此之外,本文提出了一种通过用户反馈提高精度的方法。用户验证找到的克隆候选的样本。自动化的数据挖掘技术根据用户决策和不同的代码指标来学习决策树。我们调查了几种指标的相关性,以及从一个应用程序域学到的标准是否可以推广到其他域。版权所有©2013 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号