BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences

Sami Ullah; Heekuck Oh

首页> 外文期刊>IEEE Transactions on Software Engineering >BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences

【24h】

BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences

机译：BinDiffNN：学习汇编的分布式表示，以实现针对语义差异的鲁棒二进制差异

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Binary diffing is a process to discover the differences and similarities in functionality between two binary programs. Previous research on binary diffing approaches it as a function matching problem to formulate an initial 1:1 mapping between functions, and later a sequence matching ratio is computed to classify two functions being an exact match, a partial match or no-match . The accuracy of existing techniques is best only when detecting exact matches and they are not efficient in detecting partially changed functions; especially those with minor patches. These drawbacks are due to two major challenges (i) In the 1:1 mapping phase, using a strict policy to match function features (ii) In the classification phase, considering an assembly snippet as a normal text, and using sequence matching for similarity comparison. Instruction has a unique structure i.e. mnemonics and registers have a specific position in instruction and also have a semantic relationship, which makes assembly code different from general text. Sequence matching performs best for general text but it fails to detect structural and semantic changes at an instruction level thus, its use for classification produces many false results. In this research, we have addressed the aforementioned underlying challenges by proposing a two-fold solution. For the 1:1 mapping phase, we have proposed computationally inexpensive features, which are compared with distance-based selection criteria to map similar functions and filter unmatched functions. For the classification phase, we have proposed a Siamese binary-classification neural network where each branch is an attention-based distributed learning embedding neural network — that learn the semantic similarity among assembly instructions, learn to highlight the changes at an instruction level and a final stage fully connected layer learn to accurately classify two 1:1 mapped function either an exact or a partial match. We have used x86 kernel binaries for training and achieved $sim 99%$ classification accuracy; which is higher than existing binary diffing techniques and tools.

机译：二进制比较是发现两个二进制程序之间功能差异和相似之处的过程。以往对二元差分的研究将其视为函数匹配问题，以制定函数之间的初始 1：1 映射，然后计算序列匹配率以将两个函数分类为完全匹配、部分匹配或不匹配。现有技术的准确性仅在检测精确匹配时才最佳，并且在检测部分更改的功能方面效率不高;尤其是那些有小补丁的人。这些缺点是由于两个主要挑战：（i）在1：1映射阶段，使用严格的策略来匹配功能特征（ii）在分类阶段，将程序集片段视为普通文本，并使用序列匹配进行相似性比较。指令具有独特的结构，即助记符和寄存器在指令中具有特定的位置，并且还具有语义关系，这使得汇编代码不同于一般文本。序列匹配对一般文本性能最佳，但它无法在指令级别检测结构和语义变化，因此，将其用于分类会产生许多错误结果。在这项研究中，我们通过提出双重解决方案来解决上述潜在挑战。对于 1：1 映射阶段，我们提出了计算成本低廉的特征，将其与基于距离的选择标准进行比较，以映射相似函数并过滤不匹配的函数。对于分类阶段，我们提出了一个连体二元分类神经网络，其中每个分支都是一个基于注意力的分布式学习嵌入神经网络——它学习汇编指令之间的语义相似性，学习在指令级别和最后阶段的全连接层学习准确地对两个 1：1 映射函数进行分类，无论是精确匹配还是部分匹配。我们使用 x86 内核二进制文件进行训练，并实现了 $sim 99%$ 的分类准确率;这高于现有的二进制差异技术和工具。

著录项

来源
《IEEE Transactions on Software Engineering》 |2022年第9期|3442-3466|共25页
作者
Sami Ullah; Heekuck Oh;
展开▼
作者单位

Department of Computer Science and Engineering, Hanyang University, Ansan, South Korea;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类程序设计;计算机软件;
关键词
Semantics; Heuristic algorithms; Classification algorithms; Syntactics; Registers; Tools; Task analysis;

机译：语义;启发式算法;分类算法;句法;寄存器;工具;任务分析;

相似文献

外文文献
中文文献

1. Semantic composition of distributed representations for query subtopic mining [J] . Wei SONG, Ying LIU, Li-zhen LIU, 浙江大学学报（英文版）（C辑：计算机与电子） . 2018,第11期
2. Findings from Hanyang University in the Area of Networks Reported Bindiff( Nn): Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences [J] . Network Daily News . 2022,第19期

机译：Findings from Hanyang University in the Area of Networks Reported Bindiff( Nn): Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences
3. New Machine Learning Findings Has Been Reported by Investigators at Beihang University (Multi-semantic Path Representation Learning for Travel Time Estimation) [J] . Robotics & Machine Learning Daily News . 2022,第14期

机译：New Machine Learning Findings Has Been Reported by Investigators at Beihang University (Multi-semantic Path Representation Learning for Travel Time Estimation)
4. What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning [J] . Keller Patrick, Kabore Abdoul Kader, Plein LauraKlein JacquesLe Traon YvesBissyande Tegawende F. ACM transactions on software engineering and methodology . 2022,第2期

机译：What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences

摘要

著录项

相似文献

相关主题

期刊订阅