Source Code Authorship Attribution Using Long Short-Term Memory Based Networks

机译：使用基于长期短期记忆的网络的源代码作者身份归属

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning approaches to source code authorship attribution attempt to find statistical regularities in human-generated source code that can identify the author or authors of that code. This has applications in plagiarism detection, intellectual property infringement, and post-incident forensics in computer security. The introduction of features derived from the Abstract Syntax Tree (AST) of source code has recently set new benchmarks in this area, significantly improving over previous work that relied on easily obfuscatable lexical and format features of program source code. However, these AST-based approaches rely on hand-constructed features derived from such trees, and often include ancillary information such as function and variable names that may be obfuscated or manipulated. In this work, we provide novel contributions to AST-based source code authorship attribution using deep neural networks. We implement Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) models to automatically extract relevant features from the AST representation of programmers' source code. We show that our models can automatically learn efficient representations of AST-based features without needing hand-constructed ancillary information used by previous methods. Our empirical study on multiple datasets with different programming languages shows that our proposed approach achieves the state-of-the-art performance for source code authorship attribution on AST-based features, despite not leveraging information that was previously thought to be required for high-confidence classification.

机译：机器学习对源代码作者身份进行归因的方法试图在人为生成的源代码中找到可以识别该代码的作者的统计规律。这在窃检测，知识产权侵权和计算机安全事件后取证中具有应用。从源代码的抽象语法树（AST）派生的功能的引入最近在这一领域树立了新的基准，相对于以前的工作（依赖于程序源代码的容易混淆的词法和格式功能），该功能有了显着改进。但是，这些基于AST的方法依赖于从此类树派生的手工构造特征，并且通常包括辅助信息，例如可能被混淆或操纵的功能和变量名。在这项工作中，我们使用深度神经网络为基于AST的源代码作者归属提供了新颖的贡献。我们实现了长期短期内存（LSTM）和双向长期短期内存（BiLSTM）模型，以自动从程序员源代码的AST表示中提取相关功能。我们表明，我们的模型可以自动学习基于AST的功能的有效表示，而无需先前方法使用的手工构造的辅助信息。我们对使用不同编程语言的多个数据集进行的实证研究表明，尽管没有利用以前认为的高水平信息所必需的信息，但我们提出的方法在基于AST的功能上实现了源代码作者身份归属的最新性能。置信度分类。

著录项

来源
《European symposium on research in computer security》|2017年|65-82|共18页
会议地点
作者
Bander Alsulami; Edwin Dauber; Richard Harang; Spiros Mancoridis; Rachel Greenstadt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Source code authorship attribution; Code stylometry; Long short-term memory; Abstract syntax tree; Security; Privacy;

机译：源代码作者身份归属;代码风格;长期记忆抽象语法树;安全;隐私;

相似文献

外文文献
中文文献
专利

1. Comparing techniques for authorship attribution of source code [J] . Steven Burrows, Alexandra L. Uitdenbogerd, Andrew Turpin Software . 2014,第1期

机译：比较源代码作者身份的技术
2. Pose-based multisource networks using convolutional neural network and long short-term memory for action recognition [J] . Hu Fangqiang, Wu Qianyu, Zhang Sai, Journal of electronic imaging . 2019,第4期

机译：使用卷积神经网络和长短期记忆的基于姿势的多源网络的动作识别
3. A Deep Learning-based Artificial Neural Network Method for Instance-based Arabic Language Authorship Attribution [J] . Mohammad Al-Sarem, Abdullah Alsaeedi, Faisal Saeed International Journal of Advances in Soft Computing and Its Applications . 2020,第2期

机译：基于深入的基于学习的人工神经网络方法，用于基于类似的阿拉伯语作者归因
4. Source Code Authorship Attribution Using Long Short-Term Memory Based Networks [C] . Bander Alsulami, Edwin Dauber, Richard Harang, European Symposium on Research in Computer Security . 2017

机译：使用长短期内存基于网络的源代码作者归因
5. An Exploration of Source Code Authorship Attribution Using Sequence Learning [D] . ?Kong, Xiangling 2019

机译：使用序列学习源代码作者归因的探索
6. Authorship attribution of source code by using back propagation neural network based on particle swarm optimization [O] . Xinyu Yang, Guoai Xu, Qi Li, 2011

机译：基于粒子群算法的反向传播神经网络对源代码的作者归属
7. Authorship attribution of source code by using back propagation neural network based on particle swarm optimization. [O] . Xinyu Yang, Guoai Xu, Qi Li, 2017

机译：基于粒子群算法的反向传播神经网络的源代码作者属性。

Source Code Authorship Attribution Using Long Short-Term Memory Based Networks

摘要

著录项

相似文献

相关主题

期刊订阅