【24h】

Using recurrent neural networks for decompilation

机译:使用反复性神经网络进行反作化

获取原文

摘要

Decompilation, recovering source code from binary, is useful in many situations where it is necessary to analyze or understand software for which source code is not available. Source code is much easier for humans to read than binary code, and there are many tools available to analyze source code. Existing decompilation techniques often generate source code that is difficult for humans to understand because the generated code often does not use the coding idioms that programmers use. Differences from human-written code also reduce the effectiveness of analysis tools on the decompiled source code. To address the problem of differences between decompiled code and human-written code, we present a novel technique for decompiling binary code snippets using a model based on Recurrent Neural Networks. The model learns properties and patterns that occur in source code and uses them to produce decompilation output. We train and evaluate our technique on snippets of binary machine code compiled from C source code. The general approach we outline in this paper is not language-specific and requires little or no domain knowledge of a language and its properties or how a compiler operates, making the approach easily extensible to new languages and constructs. Furthermore, the technique can be extended and applied in situations to which traditional decompilers are not targeted, such as for decompilation of isolated binary snippets; fast, on-demand decompilation; domain-specific learned decompilation; optimizing for readability of decompilation; and recovering control flow constructs, comments, and variable or function names. We show that the translations produced by this technique are often accurate or close and can provide a useful picture of the snippet's behavior.
机译:反编译,从二进制文件中恢复源代码,在许多情况下都很有用,在许多情况下,有必要分析或理解哪些源代码不可用的软件。源代码对于人类来读取而不是二进制代码更容易,并且有许多工具可用于分析源代码。现有的分解技术经常生成人类难以理解的源代码,因为生成的代码通常不会使用程序员使用的编码成语。人写代码的差异也降低了分析工具对分解源代码的有效性。为了解决反编译代码和人写代码之间的差异问题,我们使用基于经常性神经网络的模型来提出一种对二元代码片段进行分解的新技术。该模型了解源代码中发生的属性和模式,并使用它们来产生反作用输出。我们培训并评估我们在从C源代码编译的二进制机器代码片段上进行的技术。我们在本文中概述的一般方法不是语言特定的,需要几乎没有语言及其属性的域名知识或编译器如何运行,使得该方法易于扩展到新语言和构造。此外,该技术可以延伸和应用于传统分解器未靶向的情况,例如用于隔离二进制片段的反作用;快速,点播的反应;具体域名学习的解答;优化不可读性的分解;并恢复控制流构造,评论和变量或函数名称。我们表明,通过该技术产生的翻译通常是准确的或关闭,并且可以提供片段行为的有用图片。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号