Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection

Lannan Luo; Jiang Ming; Dinghao Wu; Peng Liu; Sencun Zhu

首页> 外文期刊>Software Engineering, IEEE Transactions on >Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection

【24h】

Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection

机译：基于语义的混淆-弹性二进制代码相似度比较及其在软件和算法Pla窃检测中的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. Identifying similar or identical code fragments among programs is very important in some applications. For example, one application is to detect illegal code reuse. In the code theft cases, emerging obfuscation techniques have made automated detection increasingly difficult. Another application is to identify cryptographic algorithms which are widely employed by modern malware to circumvent detection, hide network communications, and protect payloads among other purposes. Due to diverse coding styles and high programming flexibility, different implementation of the same algorithm may appear very distinct, causing automatic detection to be very hard, let alone code obfuscations are sometimes applied. In this paper, we propose a binary-oriented, obfuscation-resilient binary code similarity comparison method based on a new concept, longest common subsequence of semantically equivalent basic blocks , which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantic equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantic similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype. The experimental results show that our method can be applied to software plagiarism and algorithm detection, and is effective and practical to analyze real-world software.

机译：现有的代码相似性比较方法，无论是基于源代码还是基于二进制代码，都大多无法抵抗混淆。在某些应用程序中，识别程序中相似或相同的代码片段非常重要。例如，一种应用是检测非法代码重用。在代码盗窃的情况下，新兴的混淆技术使自动检测变得越来越困难。另一个应用是识别被现代恶意软件广泛采用的加密算法，以规避检测，隐藏网络通信并保护有效载荷等。由于不同的编码风格和高度的编程灵活性，同一算法的不同实现可能看起来非常不同，从而导致自动检测非常困难，更不用说有时会进行代码混淆。在本文中，我们提出了一种基于新概念的，面向语义的，混淆性强的二进制代码相似性比较方法，即语义等效基本块的最长公共子序列，它将严格的程序语义与基于最长公共子序列的模糊匹配相结合。我们通过一组代表该块的输入-输出关系的符号公式来对基本块的语义进行建模。这样，定理证明者可以检查两个块的语义等效性（和相似性）。然后，我们使用以基本块为元素的最长公共子序列对两条路径的语义相似性进行建模。这种新颖的组合导致了强大的代码混淆能力。我们已经开发了原型。实验结果表明，该方法可应用于软件窃和算法检测，对实际软件进行分析是有效和实用的。

著录项

来源
《Software Engineering, IEEE Transactions on》 |2017年第12期|1157-1177|共21页
作者
Lannan Luo; Jiang Ming; Dinghao Wu; Peng Liu; Sencun Zhu;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Semantics; Software development; Plagiarism; Binary codes; Software algorithms; Syntactics; Computational modeling;

机译：语义;软件开发;P窃;二进制代码;软件算法;句法;计算建模;

相似文献

外文文献
中文文献
专利

1. Deviation-Based Obfuscation-Resilient Program Equivalence Checking With Application to Software Plagiarism Detection [J] . Jiang Ming, Fangfang Zhang, Dinghao Wu, IEEE Transactions on Reliability . 2016,第4期

机译：基于偏差的模糊弹性程序等效性检查及其在软件抄袭检测中的应用
2. Source Code Plagiarism Detection Using Biological String Similarity Algorithms [J] . Imad Rahal, Colin Wielga Journal of information & knowledge management . 2014,第3期

机译：使用生物字符串相似度算法的源代码Pla窃检测
3. AN ITERATIVE GENETIC ALGORITHM BASED SOURCE CODE PLAGIARISM DETECTION APPROACH USING NCRR SIMILARITY MEASURE [J] . M. BHAVANI, DR.K.THAMMI REDDY, DR.P.SURESH VARMA Journal of Theoretical and Applied Information Technology . 2018,第3期

机译：NCRR相似度的基于迭代遗传算法的源代码抄袭检测方法
4. A Code Comparison Algorithm Based on AST for Plagiarism Detection [C] . Feng Jianglang, Cui Baojiang, Xia Kunfeng 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies . 2013

机译：一种基于AST的抄袭检测代码比较算法
5. Obfuscation-Resilient Code Detection Analyses for Android Apps [D] . Wang, Yan. 2018

机译：适用于Android应用程序的混淆弹性代码检测分析
6. Biochemia Medica has started using the CrossCheck plagiarism detection software powered by iThenticate [O] . Vesna Šupak-Smolčić, Ana-Maria Šimundić 2013

机译：Medichemia Medica已开始使用由iThenticate提供支持的CrossCheck抄袭检测软件
7. A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison [O] . Yikun Hu, Hui Wang, Yuanyuan Zhang, 2021

机译：基于语义的混合方法对二进制代码相似性比较

Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection

摘要

著录项

相似文献

相关主题

期刊订阅