In-memory fuzzing for binary code similarity analysis

机译：用于二进制代码相似性分析的内存模糊测试

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.

机译：在二进制可执行文件中检测相似功能是许多二进制代码分析和重用任务的基础。到目前为止，识别二进制代码中的相似组件仍然是一个挑战。现有研究采用静态或动态方法来捕获程序语法或语义级别的特征以进行比较。但是，在先前的工作中存在多个设计局限性，这导致相对较高的成本，较低的准确性和可扩展性，从而严重阻碍了其实际使用。在本文中，我们提出了一种利用内存模糊测试进行二进制代码相似性分析的新颖方法。我们的原型工具IMF-SIM应用内存模糊测试对每个功能启动分析，并收集各种程序行为的痕迹。根据两个行为轨迹的最长公共子序列，计算它们的相似性得分。为了比较两个函数，将生成一个特征向量，其元素是行为跟踪级别比较的相似性得分。我们通过标记的特征向量训练机器学习模型;随后，对于给定的特征向量，通过比较两个函数，训练后的模型给出最终分数，代表两个函数的相似性分数。我们根据不同编译器，优化和常用混淆方法编译的二进制文件对IMF-SIM进行评估，总共有1000多个二进制可执行文件。我们的评估表明，IMF-SIM的性能明显优于现有工具，具有更高的准确性和更广阔的应用范围。

著录项

来源
《IEEE/ACM International Conference on Automated Software Engineering》|2017年|319-330|共12页
会议地点
作者
Shuai Wang; Dinghao Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Binary codes; Tools; Runtime; Indexes; Syntactics;

机译：二进制代码;工具;运行时;索引;语法;

相似文献

外文文献
中文文献
专利

1. WhirlingFuzzwork: a taint-analysis-based API in-memory fuzzing framework [J] . Cui Baojiang, Wang Fuwei, Hao Yongle, Soft computing: A fusion of foundations, methodologies and applications . 2017,第12期

机译：旋转功能：基于Taint分析的API内存模糊框架
2. Smart fuzzing method for detecting stack-based buffer overflow in binary codes [J] . Maryam Mouzarani, Babak Sadeghiyan, Mohammad Zolfaghari Software, IET . 2016,第4期

机译：用于检测二进制代码中基于堆栈的缓冲区溢出的智能模糊方法
3. A Study on Using Code Coverage Information Extracted from Binary to Guide Fuzzing [J] . Baoying Lou, Jia Song International Journal of Computer Science and Security . 2020,第5期

机译：使用二进制提取到引导模糊的代码覆盖信息的研究
4. In-memory fuzzing for binary code similarity analysis [C] . Shuai Wang, Dinghao Wu IEEE/ACM International Conference on Automated Software Engineering . 2017

机译：二进制代码相似性分析的内存模糊
5. Large-scale image retrieval using similarity preserving binary codes [D] . Gong, Yunchao 2014

机译：使用保留相似性的二进制代码进行大规模图像检索
6. Multi-spectral image analysis of binary encoded microspheres for highly multiplexed suspension arrays [O] . Abhishek Mathur, David M. Kelso -1

机译：高度复用悬架阵列二元编码微球的多光谱图像分析
7. Sparse Ternary Codes for similarity search have higher coding gain than dense binary codes [O] . Ferdowsi, Sohrab, Voloshynovskiy, Slava, Kostadinov, Dimche, 2017

机译：用于相似性搜索的稀疏三元码具有比编码增益更高的编码增益密集二进制代码

In-memory fuzzing for binary code similarity analysis

摘要

著录项

相似文献

相关主题

期刊订阅