A hybrid code representation learning approach for predicting method names

Fengyi Zhang; Bihuan Chen; Rongfan Li; Xin Peng

首页> 外文期刊>The Journal of Systems and Software >A hybrid code representation learning approach for predicting method names

【24h】

A hybrid code representation learning approach for predicting method names

机译：用于预测方法名称的混合码表示学习方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Program semantic properties such as class names, method names, and variable names and types play an important role in software development and maintenance. Method names are of particular importance because they provide the cornerstone of abstraction for developers to communicate with each other for various purposes (e.g., code review and program comprehension). Existing method name prediction approaches often represent code as lexical tokens or syntactical AST (abstract syntax tree) paths, making them difficult to learn code semantics and hindering their effectiveness in predicting method names. Initial attempts have been made to represent code as execution traces to capture code semantics, but suffer scalability in collecting execution traces. In this paper, we propose a hybrid code representation learning approach, named METH2SEQ, to encode a method as a sequence of distributed vectors. METH2SEQ represents a method as (1) a bag of paths on the program dependence graph, (2) a sequence of typed intermediate representation statements and (3) a sentence of natural language comment, to scalably capture code semantics. The learned sequence of vectors of a method is fed to a decoder model to predict method names. Our evaluation with a dataset of 280.5K methods in 67 Java projects has demonstrated that METH2SEQ outperforms the two state-of-the-art code representation learning approaches in F1-score by 92.6% and 36.6%, while also outperforming two state-of-the-art method name prediction approaches in F1-score by 85.6% and 178.1%.

机译：程序语义属性，如类名称，方法名称和变量名称和类型在软件开发和维护中发挥着重要作用。方法名称特别重要，因为它们为开发人员提供了抽象的基石，以便各种目的互相通信（例如，代码审查和程序理解）。现有方法名称预测方法通常代表代码作为词汇令牌或语法AST（抽象语法树）路径，使得它们难以学习代码语义并阻碍其有效性在预测方法名称中。已经进行了初始尝试以表示代码作为执行跟踪以捕获代码语义，而是在收集执行迹线时遭受可扩展性。在本文中，我们提出了一个名为Meth2Seq的混合码表示学习方法，以将方法作为一系列分布式矢量进行编码。 Meth2Seq表示（1）程序依赖图中的一袋路径，（2）一系列类型的中间表示语句和（3）自然语言评论的句子，可伸缩地捕获代码语义。方法的学习载体序列被馈送到解码器模型以预测方法名称。我们在67个Java项目中使用280.5k方法的数据集进行了评估，表明Meth2Seq优于F1 - 得分的两种最先进的代码表示学习方法92.6％和36.6％，同时也表现出两种状态最重要的方法名称预测F1-得分达85.6％和178.1％。

著录项

来源
《The Journal of Systems and Software》 |2021年第10期|111011.1-111011.15|共15页
作者
Fengyi Zhang; Bihuan Chen; Rongfan Li; Xin Peng;
展开▼
作者单位

School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

School of Computer Science Fudan University China Shanghai Key Laboratory of Data Science Fudan University China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Code representation learning; Method name prediction Deep learning;

机译：代码学习;方法名称预测深度学习;

相似文献

外文文献
中文文献
专利

1. Development of a Three-Stage Hybrid Model by Utilizing a Two-Stage Signal Decomposition Methodology and Machine Learning Approach to Predict Monthly Runoff at Swat River Basin, Pakistan [J] . Muhammad Sibtain, Xianshan Li, Ghulam Nabi, Discrete dynamics in nature and society . 2020,第4期

机译：利用两阶段信号分解方法和机器学习方法来推动三级混合模型来预测巴基斯坦贩毒河流域每月径流
2. Predictive learning in rate-coded neuronal networks: a theoretical approach towards classical conditioning [J] . Bernd Porr, florentin Worgotter Neurocomputing . 2002,第期

机译：速率编码神经元网络中的预测性学习：经典条件的理论方法
3. A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations [J] . Zhuangwei Shi, Han Zhang, Chen Jin, BMC Bioinformatics . 2021,第1期

机译：一种基于变分性推断和图形自身阳极预测LNCRNA疾病关联的表示学习模型
4. Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach [C] . Rui Xie, Wei Ye, Jinan Sun, IEEE/ACM International Conference on Program Comprehension;International Conference on Software Engineering . 2021

机译：利用方法名称来提高代码摘要：审议多任务学习方法
5. Hybrid Machine Learning Approach for Predictive Modeling of Complex Systems [D] . Singh, Shubhendu Kumar. 2019

机译：复杂系统预测建模的混合机学习方法
6. Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation [O] . Chao Wei, Senlin Luo, Xincheng Ma, 2011

机译：局部嵌入自动编码器：一种半监督的流形学习的文档表示形式
7. Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach [O] . Rui Xie, Wei Ye, Jinan Sun, 2021

机译：利用方法名称来提高代码摘要：审议多任务学习方法

A hybrid code representation learning approach for predicting method names

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅