首页> 外文期刊>Science of Computer Programming >SeByte: Scalable clone and similarity search for bytecode
【24h】

SeByte: Scalable clone and similarity search for bytecode

机译:SeByte:可扩展的克隆和相似性搜索字节码

获取原文
获取原文并翻译 | 示例

摘要

While source code clone detection is a well-established research area, finding similar code fragments in binary and other intermediate code representations has been not yet that widely studied. In this paper, we introduce SeByte, a bytecode clone detection and search model that applies semantic-enabled token matching. It is developed based on the idea of relaxation on the code fingerprints. This approach separates the input content based on the types of tokens into different dimensions, with each dimension representing the input content from a specific point of view. Following this approach, SeByte compares each dimension separately and independently which we refer to as multi-dimensional comparison in our research. As the similarity search function we use a well-known measure that supports our multi-dimensional comparison heuristic, the Jaccard similarity coefficient Our preliminary study shows that SeByte can detect clones that are missed by existing approaches due to the differences in the input data and the search algorithm. We then further exploit the model to build a scalable bytecode clone search engine. This extension meets the requirements of a classical search engine including the ranking of result sets. Our evaluation with a large dataset of 500,000 compiled Java classes, which we extracted from the six most recent versions of the Eclipse IDE, showed that our SeByte search is not only scalable but also capable of providing a reliable ranking.
机译:尽管源代码克隆检测是一个公认的研究领域,但在二进制和其他中间代码表示形式中找到相似的代码片段尚未得到广泛研究。在本文中,我们介绍了SeByte,这是一种应用了启用语义的令牌匹配的字节码克隆检测和搜索模型。它是基于放松代码指纹的想法而开发的。此方法基于令牌的类型将输入内容分为不同的维度,每个维度从特定的角度表示输入内容。遵循这种方法,SeByte分别独立地比较每个维度,在我们的研究中我们将其称为多维比较。作为相似性搜索功能,我们使用了支持多维比较启发式的著名度量,即Jaccard相似系数。初步研究表明,SeByte可以检测由于输入数据和输入数据的差异而被现有方法遗漏的克隆。搜索算法。然后,我们进一步利用该模型来构建可伸缩的字节码克隆搜索引擎。此扩展满足经典搜索引擎的要求,包括结果集的排名。我们从Eclipse IDE的六个最新版本中提取的500,000个已编译Java类的大型数据集进行了评估,结果表明,我们的SeByte搜索不仅可扩展,而且还能够提供可靠的排名。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号