首页> 外文会议>IEEE/ACM International Conference on Automated Software Engineering >In-memory fuzzing for binary code similarity analysis
【24h】

In-memory fuzzing for binary code similarity analysis

机译:用于二进制代码相似性分析的内存模糊测试

获取原文

摘要

Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.
机译:在二进制可执行文件中检测相似功能是许多二进制代码分析和重用任务的基础。到目前为止,识别二进制代码中的相似组件仍然是一个挑战。现有研究采用静态或动态方法来捕获程序语法或语义级别的特征以进行比较。但是,在先前的工作中存在多个设计局限性,这导致相对较高的成本,较低的准确性和可扩展性,从而严重阻碍了其实际使用。在本文中,我们提出了一种利用内存模糊测试进行二进制代码相似性分析的新颖方法。我们的原型工具IMF-SIM应用内存模糊测试对每个功能启动分析,并收集各种程序行为的痕迹。根据两个行为轨迹的最长公共子序列,计算它们的相似性得分。为了比较两个函数,将生成一个特征向量,其元素是行为跟踪级别比较的相似性得分。我们通过标记的特征向量训练机器学习模型;随后,对于给定的特征向量,通过比较两个函数,训练后的模型给出最终分数,代表两个函数的相似性分数。我们根据不同编译器,优化和常用混淆方法编译的二进制文件对IMF-SIM进行评估,总共有1000多个二进制可执行文件。我们的评估表明,IMF-SIM的性能明显优于现有工具,具有更高的准确性和更广阔的应用范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号