首页> 美国卫生研究院文献>PLoS Clinical Trials >Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
【2h】

Authorship attribution of source code by using back propagation neural network based on particle swarm optimization

机译:基于粒子群算法的反向传播神经网络对源代码的作者归属

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead.
机译:作者身份归因是在一组候选已知作者中识别给定样本的最有可能的作者。它不仅可以用于发现纯文本的原始作者,例如小说,博客,电子邮件,帖子等,还可以用于标识源代码程序员。从恶意代码跟踪到解决著作权纠纷或软件窃检测,各种各样的应用程序都需要对源代码进行著作权归属。本文旨在提出一种新方法,以更高的准确性识别Java源代码样本的程序员。为此,它首先将基于粒子群优化(PSO)的反向传播(BP)神经网络引入源代码的作者身份中。它首先计算一组定义的特征量度,包括词汇和布局量度,结构和语法量度,共19个维度。然后将这些指标输入到神经网络进行监督学习,权重由PSO和BP混合算法输出。在收集的数据集上评估了该方法的有效性,该数据集包含40个作者的3,022个Java文件。实验结果表明,该方法达到了91.060%的准确率。与以前有关Java语言源代码作者身份的工作的比较表明,该方法总体上优于其他方法,而且开销也可以接受。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号