Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

机译：Asteria：基于深度学习的AST编码，用于跨平台二进制代码相似性检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Binary code similarity detection is a fundamental technique for many security applications such as vulnerability search, patch analysis, and malware detection. There is an increasing need to detect similar code for vulnerability search across architectures with the increase of critical vulnerabilities in IoT devices. The variety of IoT hardware architectures and software platforms requires to capture semantic equivalence of code fragments in the similarity detection. However, existing approaches are insufficient in capturing the semantic similarity. We notice that the abstract syntax tree (AST) of a function contains rich semantic information. Inspired by successful applications of natural language processing technologies in sentence semantic understanding, we propose a deep learning-based AST-encoding method, named ASTERIA, to measure the semantic equivalence of functions in different platforms. Our method leverages the Tree-LSTM network to learn the semantic representation of a function from its AST. Then the similarity detection can be conducted efficiently and accurately by measuring the similarity between two representation vectors. We have implemented an open-source prototype of ASTERIA. The Tree-LSTM model is trained on a dataset with 1,022,616 function pairs and evaluated on a dataset with 95,078 function pairs. Evaluation results show that our method outperforms the AST-based tool Diaphora and the-state-of-art method Gemini by large margins with respect to the binary similarity detection. And our method is several orders of magnitude faster than Diaphora and Gemini for the similarity calculation. In the application of vulnerability search, our tool successfully identified 75 vulnerable functions in 5,979 IoT firmware images.

机译：二进制代码相似性检测是许多安全应用程序的基本技术，例如漏洞搜索，修补程序分析和恶意软件检测。随着IOT设备中的关键漏洞的增加，越来越需要检测跨架构的漏洞搜索的类似代码。各种IOT硬件架构和软件平台需要捕获相似性检测中的代码片段的语义等效。然而，现有方法捕获语义相似性不足。我们注意到函数的抽象语法树（AST）包含丰富的语义信息。灵感来自于句子语义理解的自然语言处理技术的成功应用，我们提出了一种基于深度学习的AST编码方法，名为Assteria，测量不同平台中的功能的语义等效。我们的方法利用树-LSTM网络来从其AST中学习功能的语义表示。然后，通过测量两个表示向量之间的相似性，可以有效且准确地进行相似性检测。我们已经实施了Asteria的开源原型。 Tree-LSTM模型在数据集上培训，具有1,022,616函数对，并在数据集上进行评估，其中包含95,078个功能对。评估结果表明，我们的方法优于基于AST的工具膜膜和最先进的方法Gemini，通过大边缘相对于二进制相似性检测。而我们的方法是比Diaphora和Gemini的几个数量级，用于相似性计算。在漏洞搜索的应用中，我们的工具成功地在5,979个IoT固件图像中确定了75个易受攻击的功能。

著录项

来源
《IEEE/IFIP International Conference on Dependable Systems and Networks》|2021年|224-236|共13页
会议地点
作者
Shouguo Yang; Long Cheng; Yicheng Zeng; Zhe Lang; Hongsong Zhu; Zhiqiang Shi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Semantics; Prototypes; Computer architecture; Binary codes; Tools; Syntactics; Encoding; Open source software; Microprogramming;

机译：培训;语义;原型;计算机架构;二进制代码;工具;语法;编码;开源软件;微妙的图;

相似文献

外文文献
中文文献
专利

1. BinDeep: A deep learning approach to binary code similarity detection [J] . Tian Donghai, Jia Xiaoqi, Ma Rui, Expert systems with applications . 2021,第Apra期

机译：BINDEEP：二进制代码相似性检测的深度学习方法
2. Deep Learning-Based Detection for Moderate-Density Code Multiple Access in IoT Networks [J] . Han Yu, Wang Zhenyong, Guo Qing, IEEE communications letters . 2020,第1期

机译：基于深度学习的中度密度代码检测IOT网络中的多次访问
3. Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection [J] . Lannan Luo, Jiang Ming, Dinghao Wu, Software Engineering, IEEE Transactions on . 2017,第12期

机译：基于语义的混淆-弹性二进制代码相似度比较及其在软件和算法Pla窃检测中的应用
4. Deep Learning-Based Bit Reliability Based Decoding for Non-binary LDPC Codes [C] . Taishi Watanabe, Takeo Ohseki, Kosuke Yamazaki IEEE International Symposium on Information Theory . 2021

机译：基于深度学习的非二进制LDPC代码解码的基于深度学习的比特可靠性
5. Large-scale image retrieval using similarity preserving binary codes [D] . Gong, Yunchao 2014

机译：使用保留相似性的二进制代码进行大规模图像检索
6. Automatic deep learning-based colorectal adenoma detection system and its similarities with pathologists [O] . Zhigang Song, Chunkai Yu, Shuangmei Zou, 2020

机译：基于深度学习的结肠直肠腺瘤检测系统及其与病理学家的相似性
7. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection [O] . Xu, Xiaojun, Liu, Chang, Feng, Qian, 2017

机译：基于神经网络的跨平台二进制码图嵌入相似性检测

Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

摘要

著录项

相似文献

相关主题

期刊订阅