首页> 外文期刊>Future generation computer systems >Code authorship identification using convolutional neural networks
【24h】

Code authorship identification using convolutional neural networks

机译:使用卷积神经网络的代码作者识别

获取原文
获取原文并翻译 | 示例
       

摘要

Although source code authorship identification creates a privacy threat for many open source contributors, it is an important topic for the forensics field and enables many successful forensic applications, including ghostwriting detection, copyright dispute settlements, and other code analysis applications. This work proposes a convolutional neural network (CNN) based code authorship identification system. Our proposed system exploits term frequency-inverse document frequency, word embedding modeling, and feature learning techniques for code representation. This representation is then fed into a CNN-based code authorship identification model to identify the code's author. Evaluation results from using our approach on data from Google Code Jam demonstrate an identification accuracy of up to 99.4% with 150 candidate programmers, and 96.2% with 1,600 programmers. The evaluation of our approach also shows high accuracy for programmers identification over real-world code samples from 1987 public repositories on GitHub with 95% accuracy for 745 C programmers and 97% for the C++ programmers. These results indicate that the proposed approaches are not language-specific techniques and can identify programmers of different programming languages. (C) 2018 Elsevier B.V. All rights reserved.
机译:虽然源代码作者身份识别为许多开源贡献者创造了隐私威胁,但它是您的法医领域的一个重要主题,并实现了许多成功的法医应用程序,包括重婚检测,版权争议解决和其他代码分析应用程序。这项工作提出了一种基于卷积神经网络(CNN)的代码作者识别系统。我们所提出的系统利用术语频率 - 逆文档频率,Word嵌入建模和代码表示的特征学习技术。然后将该表示进入基于CNN的代码作者身份识别模型,以识别代码的作者。评估结果是使用我们从Google Code Jam的数据上的方法展示了高达99.4%的识别准确性,150名候选程序员,96.2%,有1,600个程序员。对我们的方法的评估还显示了程序员在1987年关于Github上的公共储备库中的实际代码样本识别的高准确性,对于745 C程序员,95%的准确度和C ++程序员的97%。这些结果表明,所提出的方法不是语言特定的技术,可以识别不同编程语言的程序员。 (c)2018年elestvier b.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号