Hierarchical Learning of Cross-Language Mappings Through Distributed Vector Representations for Code

机译：通过代码的分布式矢量表示分层学习跨语言映射

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may be either specific to the language grammars, or specific to certain kinds of code elements (e.g., tokens, phrases, API uses). This paper proposes a new approach to automatically learn cross-language representations for various kinds of structural code elements that may be used for program translation. Our key idea is two folded: First, we normalize and enrich code token streams with additional structural and semantic information, and train cross-language vector representations for the tokens (a.k.a. shared embeddings based on word2vec, a neural-network-based technique for producing word embeddings; Second, hierarchically from bottom up, we construct shared embeddings for code elements of higher levels of granularity (e.g., expressions, statements, methods) from the embeddings for their constituents, and then build mappings among code elements across languages based on similarities among embeddings. Our preliminary evaluations on about 40,000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores. When compared with an existing tool for mapping library API methods, our approach identifies many more mappings accurately. The mapping results and code can be accessed at https://github.com/bdqnghi/hierarchical-programming-language-mapping) We believe that our idea for learning cross-language vector representations with code structural information can be a useful step towards automated program translation.

机译：将以一种编程语言编写的程序转换为另一种编程语言，对于需要使用不同语言进行功能实现的软件开发任务很有用。尽管过去的研究已经考虑了这个问题，但是它们可能特定于语言语法，或者特定于某些类型的代码元素（例如，令牌，短语，API使用）。本文提出了一种新的方法，可以自动学习可能用于程序翻译的各种结构代码元素的跨语言表示形式。我们的关键思想有两个方面：首先，我们使用其他结构和语义信息来规范化和丰富代码令牌流，并训练令牌的跨语言矢量表示（又名基于word2vec的共享嵌入，这是一种基于神经网络的技术，用于产生单词嵌入；其次，从下至上，从层次结构上，我们从更高层次的代码元素（例如，表达式，语句，方法）的构成中构造它们的共享嵌入，然后基于相似性在跨语言的代码元素之间建立映射我们对来自9个软件项目的大约40,000个Java和C＃源文件进行了初步评估，结果表明，我们的方法可以自动学习不同语言的各种代码元素的共享嵌入，并以合理的平均平均精度得分来识别它们的跨语言映射。借助用于映射库API方法的现有工具，我们的方法可以确定准确地绘制更多的映射。可以在以下网址访问映射结果和代码：https://github.com/bdqnghi/hierarchical-programming-language-mapping）我们认为，学习带有代码结构信息的跨语言矢量表示的想法可能是迈向自动化的有用步骤程序翻译。

著录项

来源
《2018 IEEE/ACM 40th International Conference on Software Engineering: New Ideas and Emerging Technologies Results》|2018年|33-36|共4页
会议地点 Gothenburg(SE)
作者
Nghi D. Q. Bui; Lingxiao Jiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Java; Semantics; C# languages; Task analysis; Libraries; Natural languages; Software;

机译：Java;语义; C＃语言;任务分析;库;自然语言;软件;;
入库时间 2022-08-26 13:55:28

相似文献

外文文献
中文文献
专利

1. Distributed video coding supporting hierarchical GOP structures with transmitted motion vectors [J] . Kyung-Yeon Min, Woong Lim, Junghak Nam, EURASIP journal on image and video processing . 2015,第1期

机译：支持带有传输运动矢量的分层GOP结构的分布式视频编码
2. Semantically Readable Distributed Representation Learning and Its Expandability Using a Word Semantic Vector Dictionary [J] . Ikuo KESHI, Yu SUZUKI, Koichiro YOSHINO, IEICE transactions on information and systems . 2018,第4期

机译：使用词语义向量字典的语义可读分布式表示学习及其可扩展性
3. Representation learning by hierarchical ELM auto-encoder with double random hidden layers [J] . Li Rui, Wang Xiaodan, Lei Lei, Computer Vision, IET . 2019,第4期

机译：具有双重随机隐藏层的分层ELM自动编码器的表示学习
4. Hierarchical Learning of Cross-Language Mappings Through Distributed Vector Representations for Code [C] . Nghi D. Q. Bui, Lingxiao Jiang IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results . 2018

机译：通过分布式矢量表示代码的分布式映射的分层学习
5. An exploration of the word2vec algorithm: Creating a vector representation of a language vocabulary that encodes meaning and usage patterns in the vector space structure [D] . Le, Thu Anh. 2016

机译：word2vec算法的探索：创建语言词汇的矢量表示，该矢量表示编码矢量空间结构中的含义和用法模式
6. Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model [O] . Lujia Chen, Chunhui Cai, Vicky Chen, 2016

机译：使用自动编码器模型学习酵母转录组学机制的层次表示
7. Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph [O] . Zhuoren Jiang, Yue Yin, Liangcai Gao, 2018

机译：通过分层表示在异构图中学习的跨语言引文推荐

Hierarchical Learning of Cross-Language Mappings Through Distributed Vector Representations for Code

摘要

著录项

相似文献

相关主题

期刊订阅