首页> 外文期刊>Future generation computer systems >Pkg2Vec: Hierarchical package embedding for code authorship attribution
【24h】

Pkg2Vec: Hierarchical package embedding for code authorship attribution

机译:PKG2VEC:嵌入代码Autheration归属的分层包

获取原文
获取原文并翻译 | 示例
       

摘要

Authorship attribution of software is the task of identifying the author of a given piece of code. Code attribution is of importance in multiple scenarios, ranging from software plagiarism to cybersecurity. In this paper, we introduce authorship attribution of software packages that better reflect real-world scenarios in which code is organized in packages and written by teams. We present a novel approach for software package authorship attribution called Pkg2Vec, based on a hierarchical deep neural network (DNN) architecture, corresponding to the hierarchical nature of software (code) packages. The hierarchical neural network model consists of a token level encoder and an attention mechanism for a function level encoder, together producing package embedding. Beyond package embedding, we use keywords and API calls as resilient features, which reflect the programmer's intention and style. Pkg2Vec is evaluated on a large dataset of public packages and compared to a number of other source code authorship attribution state-of-the-art algorithms. We find that Pkg2Vec significantly outperforms other approaches, achieving a 13% improvement in accuracy.
机译:软件的作者归属是识别给定代码作者的任务。代码归属在多种情况下具有重要性,从软件抄袭到网络安全。在本文中,我们介绍了软件包的Autheration归属,更好地反映了在包装中组织的真实情景,并由团队编写。我们介绍了一种新的软件包作者归属方法,称为PKG2VEC,基于分层深度神经网络(DNN)架构,对应于软件(代码)包的分层性质。分层神经网络模型包括令牌级编码器和用于函数级编码器的注意机制,一起产生包装嵌入。超越包裹嵌入,我们使用关键字和API调用作为弹性功能,反映了程序员的意图和风格。 PKG2VEC在公共包的大型数据包上进行评估,并与许多其他源代码Authorive属性算法进行比较。我们发现PKG2VEC显着优于其他方法,准确性提高了13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号