首页> 外文期刊>Future generation computer systems >Code authorship identification using convolutional neural networks
【24h】

Code authorship identification using convolutional neural networks

机译:使用卷积神经网络进行代码作者身份识别

获取原文
获取原文并翻译 | 示例
       

摘要

Although source code authorship identification creates a privacy threat for many open source contributors, it is an important topic for the forensics field and enables many successful forensic applications, including ghostwriting detection, copyright dispute settlements, and other code analysis applications. This work proposes a convolutional neural network (CNN) based code authorship identification system. Our proposed system exploits term frequency-inverse document frequency, word embedding modeling, and feature learning techniques for code representation. This representation is then fed into a CNN-based code authorship identification model to identify the code's author. Evaluation results from using our approach on data from Google Code Jam demonstrate an identification accuracy of up to 99.4% with 150 candidate programmers, and 96.2% with 1,600 programmers. The evaluation of our approach also shows high accuracy for programmers identification over real-world code samples from 1987 public repositories on GitHub with 95% accuracy for 745 C programmers and 97% for the C++ programmers. These results indicate that the proposed approaches are not language-specific techniques and can identify programmers of different programming languages. (C) 2018 Elsevier B.V. All rights reserved.
机译:尽管源代码作者身份的标识给许多开源贡献者带来了隐私威胁,但它是取证领域的重要主题,并可以实现许多成功的取证应用程序,包括代笔检测,版权纠纷解决和其他代码分析应用程序。这项工作提出了一种基于卷积神经网络(CNN)的代码作者身份识别系统。我们提出的系统利用术语频率逆文档频率,词嵌入建模和特征学习技术来表示代码。然后,将此表示形式输入到基于CNN的代码作者身份识别模型中,以标识代码的作者。对Google Code Jam中的数据使用我们的方法得出的评估结果表明,对150名候选程序员的识别准确率高达99.4%,对1,600名程序员的识别准确率高达96.2%。我们对这种方法的评估还显示出,对于1987年GitHub公共存储库中的真实代码样本,程序员的识别准确性很高,745 C程序员的准确性为95%,C ++程序员的准确性为97%。这些结果表明,所提出的方法不是特定于语言的技术,并且可以识别不同编程语言的程序员。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号