首页> 外文期刊>The Journal of Systems and Software >A hybrid code representation learning approach for predicting method names
【24h】

A hybrid code representation learning approach for predicting method names

机译:用于预测方法名称的混合码表示学习方法

获取原文
获取原文并翻译 | 示例

摘要

Program semantic properties such as class names, method names, and variable names and types play an important role in software development and maintenance. Method names are of particular importance because they provide the cornerstone of abstraction for developers to communicate with each other for various purposes (e.g., code review and program comprehension). Existing method name prediction approaches often represent code as lexical tokens or syntactical AST (abstract syntax tree) paths, making them difficult to learn code semantics and hindering their effectiveness in predicting method names. Initial attempts have been made to represent code as execution traces to capture code semantics, but suffer scalability in collecting execution traces. In this paper, we propose a hybrid code representation learning approach, named METH2SEQ, to encode a method as a sequence of distributed vectors. METH2SEQ represents a method as (1) a bag of paths on the program dependence graph, (2) a sequence of typed intermediate representation statements and (3) a sentence of natural language comment, to scalably capture code semantics. The learned sequence of vectors of a method is fed to a decoder model to predict method names. Our evaluation with a dataset of 280.5K methods in 67 Java projects has demonstrated that METH2SEQ outperforms the two state-of-the-art code representation learning approaches in F1-score by 92.6% and 36.6%, while also outperforming two state-of-the-art method name prediction approaches in F1-score by 85.6% and 178.1%.
机译:程序语义属性,如类名称,方法名称和变量名称和类型在软件开发和维护中发挥着重要作用。方法名称特别重要,因为它们为开发人员提供了抽象的基石,以便各种目的互相通信(例如,代码审查和程序理解)。现有方法名称预测方法通常代表代码作为词汇令牌或语法AST(抽象语法树)路径,使得它们难以学习代码语义并阻碍其有效性在预测方法名称中。已经进行了初始尝试以表示代码作为执行跟踪以捕获代码语义,而是在收集执行迹线时遭受可扩展性。在本文中,我们提出了一个名为Meth2Seq的混合码表示学习方法,以将方法作为一系列分布式矢量进行编码。 Meth2Seq表示(1)程序依赖图中的一袋路径,(2)一系列类型的中间表示语句和(3)自然语言评论的句子,可伸缩地捕获代码语义。方法的学习载体序列被馈送到解码器模型以预测方法名称。我们在67个Java项目中使用280.5k方法的数据集进行了评估,表明Meth2Seq优于F1 - 得分的两种最先进的代码表示学习方法92.6%和36.6%,同时也表现出两种状态最重要的方法名称预测F1-得分达85.6%和178.1%。

著录项

  • 来源
    《The Journal of Systems and Software》 |2021年第10期|111011.1-111011.15|共15页
  • 作者单位

    School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

    School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

    School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

    School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Code representation learning; Method name prediction Deep learning;

    机译:代码学习;方法名称预测深度学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号