...
首页> 外文期刊>Software Engineering, IEEE Transactions on >Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection
【24h】

Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection

机译:基于语义的混淆-弹性二进制代码相似度比较及其在软件和算法Pla窃检测中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. Identifying similar or identical code fragments among programs is very important in some applications. For example, one application is to detect illegal code reuse. In the code theft cases, emerging obfuscation techniques have made automated detection increasingly difficult. Another application is to identify cryptographic algorithms which are widely employed by modern malware to circumvent detection, hide network communications, and protect payloads among other purposes. Due to diverse coding styles and high programming flexibility, different implementation of the same algorithm may appear very distinct, causing automatic detection to be very hard, let alone code obfuscations are sometimes applied. In this paper, we propose a binary-oriented, obfuscation-resilient binary code similarity comparison method based on a new concept, longest common subsequence of semantically equivalent basic blocks , which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantic equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantic similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype. The experimental results show that our method can be applied to software plagiarism and algorithm detection, and is effective and practical to analyze real-world software.
机译:现有的代码相似性比较方法,无论是基于源代码还是基于二进制代码,都大多无法抵抗混淆。在某些应用程序中,识别程序中相似或相同的代码片段非常重要。例如,一种应用是检测非法代码重用。在代码盗窃的情况下,新兴的混淆技术使自动检测变得越来越困难。另一个应用是识别被现代恶意软件广泛采用的加密算法,以规避检测​​,隐藏网络通信并保护有效载荷等。由于不同的编码风格和高度的编程灵活性,同一算法的不同实现可能看起来非常不同,从而导致自动检测非常困难,更不用说有时会进行代码混淆。在本文中,我们提出了一种基于新概念的,面向语义的,混淆性强的二进制代码相似性比较方法,即语义等效基本块的最长公共子序列,它将严格的程序语义与基于最长公共子序列的模糊匹配相结合。我们通过一组代表该块的输入-输出关系的符号公式来对基本块的语义进行建模。这样,定理证明者可以检查两个块的语义等效性(和相似性)。然后,我们使用以基本块为元素的最长公共子序列对两条路径的语义相似性进行建模。这种新颖的组合导致了强大的代码混淆能力。我们已经开发了原型。实验结果表明,该方法可应用于软件窃和算法检测,对实际软件进行分析是有效和实用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号