A language model for statements of software code

机译：用于软件代码声明的语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Building language models for source code enables a large set of improvements on traditional software engineering tasks. One promising application is automatic code completion. State-of-the-art techniques capture code regularities at token level with lexical information. Such language models are more suitable for predicting short token sequences, but become less effective with respect to long statement level predictions. In this paper, we have proposed PCC to optimize the token-level based language modeling. Specifically, PCC introduced an intermediate representation (IR) for source code, which puts tokens into groups using lexeme and variable relative order. In this way, PCC is able to handle long token sequences, i.e., group sequences, to suggest a complete statement with the precise synthesizer. Further more, PCC employed a fuzzy matching technique which combined genetic and longest common subsequence algorithms to make the prediction more accurate. We have implemented a code completion plugin for Eclipse and evaluated it on open-source Java projects. The results have demonstrated the potential of PCC in generating precise long statement level predictions. In 30%-60% of the cases, it can correctly suggest the complete statement with only six candidates, and 40%-90% of the cases with ten candidates.

机译：为源代码构建语言模型可以对传统软件工程任务进行大量改进。一种有前途的应用是自动代码完成。最新技术利用词汇信息在令牌级别捕获代码规则性。这样的语言模型更适合于预测短标记序列，但相对于长语句级别的预测而言效果较差。在本文中，我们提出了PCC来优化基于令牌级别的语言建模。具体来说，PCC引入了一种用于源代码的中间表示（IR），该中间表示使用lexeme和变量相对顺序将令牌分组。这样，PCC能够处理较长的令牌序列（即组序列），以使用精确的合成器建议完整的语句。此外，PCC采用了一种模糊匹配技术，该技术结合了遗传算法和最长的通用子序列算法，从而使预测更加准确。我们已经为Eclipse实现了一个代码完成插件，并在开源Java项目中对其进行了评估。结果证明了PCC在生成精确的长语句级别预测中的潜力。在30 %%-60％的案例中，它可以正确地建议只有六个候选者的完整陈述，而在40 %%-90 \％的案例中只有十个候选者。

著录项

来源
《IEEE/ACM International Conference on Automated Software Engineering》|2017年|682-687|共6页
会议地点
作者
Yixiao Yang; Yu Jiang; Ming Gu; Jiaguang Sun; Jian Gao; Han Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Predictive models; Synthesizers; Java; Training data; Training; Software; Context modeling;

机译：预测模型;合成器; Java;训练数据;训练;软件;上下文建模;

相似文献

外文文献
中文文献
专利

1. SLDeep: Statement-level software defect prediction using deep-learning model on static code features [J] . Majd Amirabbas, Vahidi-Asl Mojtaba, Khalilian Alireza, Expert systems with applications . 2020,第Juna期

机译：SLDEEP：语句级软件在静态代码特征上使用深学习模型的缺陷预测
2. Characterizing and evaluating the quality of software process modeling language: Comparison of ten representative model-based languages [J] . Garcia-Garcia J. A., Enriquez J. G., Dominguez-Mayo F. J. Computer standards & interfaces . 2019,第MARa期

机译：表征和评估软件过程建模语言的质量：十种基于模型的代表性语言的比较
3. Sisp: simplified interface for stochastic programming Establishing a Hard Link between Mathematical Programming Modeling Languages and SMPS Codes [J] . Christian Condevaux-Lanloy, Emmanuel Fragniere, Alan J. King Optimization methods & software . 2002,第3期

机译：Sisp：用于随机编程的简化界面在数学编程建模语言和SMPS代码之间建立硬链接
4. A Language Model for Statements of Software Code [C] . Yixiao Yang, Yu Jiang, Ming Gu, IEEE/ACM International Conference on Automated Software Engineering . 2017

机译：软件代码语句语言模型
5. Adaptable Software Reuse: Binding Time Aware Modelling Language to Support Variations of Feature Binding Time in Software Product Line Engineering [D] . Umar, Armaya'u Zango. 2020

机译：适应性软件重用：绑定时间意识到建模语言，以支持软件产品线工程中的特征绑定时间的变体
6. Characteristics of mathematical modeling languages that facilitate model reuse in systems biology: a software engineering perspective [O] . Christopher Schölzel, Valeria Blesius, Gernot Ernst, 2021

机译：促进系统生物学模型重用的数学建模语言的特征：软件工程视角
7. Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models [O] . Zanon Boito, Marcely, Bérard, Alexandre, Villavicencio, Aline, 2017

机译：不成文的语言也需要注意！带有编码器-解码器模型的单词发现

A language model for statements of software code

摘要

著录项

相似文献

相关主题

期刊订阅