首页> 外文会议>IEEE/IFIP International Conference on Dependable Systems and Networks >Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection
【24h】

Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

机译:Asteria:基于深度学习的AST编码,用于跨平台二进制代码相似性检测

获取原文

摘要

Binary code similarity detection is a fundamental technique for many security applications such as vulnerability search, patch analysis, and malware detection. There is an increasing need to detect similar code for vulnerability search across architectures with the increase of critical vulnerabilities in IoT devices. The variety of IoT hardware architectures and software platforms requires to capture semantic equivalence of code fragments in the similarity detection. However, existing approaches are insufficient in capturing the semantic similarity. We notice that the abstract syntax tree (AST) of a function contains rich semantic information. Inspired by successful applications of natural language processing technologies in sentence semantic understanding, we propose a deep learning-based AST-encoding method, named ASTERIA, to measure the semantic equivalence of functions in different platforms. Our method leverages the Tree-LSTM network to learn the semantic representation of a function from its AST. Then the similarity detection can be conducted efficiently and accurately by measuring the similarity between two representation vectors. We have implemented an open-source prototype of ASTERIA. The Tree-LSTM model is trained on a dataset with 1,022,616 function pairs and evaluated on a dataset with 95,078 function pairs. Evaluation results show that our method outperforms the AST-based tool Diaphora and the-state-of-art method Gemini by large margins with respect to the binary similarity detection. And our method is several orders of magnitude faster than Diaphora and Gemini for the similarity calculation. In the application of vulnerability search, our tool successfully identified 75 vulnerable functions in 5,979 IoT firmware images.
机译:二进制代码相似性检测是许多安全应用程序的基本技术,例如漏洞搜索,修补程序分析和恶意软件检测。随着IOT设备中的关键漏洞的增加,越来越需要检测跨架构的漏洞搜索的类似代码。各种IOT硬件架构和软件平台需要捕获相似性检测中的代码片段的语义等效。然而,现有方法捕获语义相似性不足。我们注意到函数的抽象语法树(AST)包含丰富的语义信息。灵感来自于句子语义理解的自然语言处理技术的成功应用,我们提出了一种基于深度学习的AST编码方法,名为Assteria,测量不同平台中的功能的语义等效。我们的方法利用树-LSTM网络来从其AST中学习功能的语义表示。然后,通过测量两个表示向量之间的相似性,可以有效且准确地进行相似性检测。我们已经实施了Asteria的开源原型。 Tree-LSTM模型在数据集上培训,具有1,022,616函数对,并在数据集上进行评估,其中包含95,078个功能对。评估结果表明,我们的方法优于基于AST的工具膜膜和最先进的方法Gemini,通过大边缘相对于二进制相似性检测。而我们的方法是比Diaphora和Gemini的几​​个数量级,用于相似性计算。在漏洞搜索的应用中,我们的工具成功地在5,979个IoT固件图像中确定了75个易受攻击的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号