Deep Generation of Coq Lemma Names Using Elaborated Terms

机译：使用精心制作的术语深层生成Coq引理名称

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification projects based on proof assistants, such as Coq, increase in importance as projects grow in size and scope. While conventions can be documented and enforced manually at high cost, emerging approaches automatically learn and suggest idiomatic names in Java-like languages by applying statistical language models on large code corpora. However, due to its powerful language extension facilities and fusion of type checking and computation, Coq is a challenging target for automated learning techniques. We present novel generation models for learning and suggesting lemma names for Coq projects. Our models, based on multi-input neural networks, are the first to leverage syntactic and semantic information from Coq's lexer (tokens in lemma statements), parser (syntax trees), and kernel (elaborated terms) for naming; the key insight is that learning from elaborated terms can substantially boost model performance. We implemented our models in a toolchain, dubbed Roosterize, and applied it on a large corpus of code derived from the Mathematical Components family of projects, known for its stringent coding conventions. Our results show that Roosterize substantially outperforms baselines for suggesting lemma names, highlighting the importance of using multi-input models and elaborated terms.

机译：为使开发人员有效地理解，查看和修改大型软件项目中的源代码，必须使用命名，间距和其他基本样式属性的编码约定。随着项目规模和范围的扩大，基于证明助手（例如Coq）的验证项目中的一致约定的重要性越来越高。尽管可以以高成本手动记录和实施约定，但新兴的方法是通过在大型代码库上应用统计语言模型来自动学习和建议类似Java语言的惯用名称。但是，由于其强大的语言扩展功能以及类型检查和计算的融合，Coq是自动化学习技术的一个极具挑战性的目标。我们提出了新颖的生成模型，用于学习和建议Coq项目的引理名称。我们基于多输入神经网络的模型是第一个利用来自Coq的词法分析器（引理语句中的令牌），解析器（语法树）和内核（详细术语）的命名的语法和语义信息的公司。关键的见解是，从详尽的术语中学习可以大大提高模型的性能。我们在名为Roosterize的工具链中实现了我们的模型，并将其应用于源自Mathematical Components系列项目的大量代码集，这些项目以其严格的编码约定而闻名。我们的结果表明，对于建议引理名称，“公鸡化”实质上优于基线，突出了使用多输入模型和精心设计的术语的重要性。

著录项

来源
《International Joint Conference on Automated Reasoning》|2020年|97-118|共22页
会议地点
作者
Pengyu Nie; Karl Palmskog; Junyi Jessy Li; Milos Gligoric;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Proof assistants; Coq; Lemma names; Neural networks;

机译：证明助手;辅酶引理名称;神经网络;

相似文献

外文文献
中文文献
专利

1. Multi-Context Automated Lemma Generation for Term Rewriting Induction with Divergence Detection [J] . Chengcheng JI, Masahito KURIHARA, Haruhiko SATO IEICE transactions on information and systems . 2019,第2期

机译：多上下文自动引词生成，用于具有发散检测的术语重写归纳
2. Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications [J] . Kanwal Safia, Malik Kamran, Shahzad Khurram, ACM transactions on Asian language information processing . 2020,第1期

机译：乌尔都语命名实体识别：语料库生成和深度学习应用
3. Multiscale LSTM-Based Deep Learning for Very-Short-Term Photovoltaic Power Generation Forecasting in Smart City Energy Management [J] . Kim Dohyun, Kwon Dohyun, Park Laihyuk, IEEE systems journal . 2021,第1期

机译：基于MultiScale LSTM的深度学习，用于智能城市能源管理中的非常短期光伏发电预测
4. ROOSTERIZE: Suggesting Lemma Names for Coq Verification Projects Using Deep Learning [C] . Pengyu Nie, Karl Palmskog, Junyi Jessy Li, International Conference on Software Engineering: Companion Proceedings . 2021

机译：roostionize：建议使用深度学习的COQ验证项目的LEMMA名称
5. Automated Lemma Generation and Multi-Context Schemes for Rewriting Induction [D] . 季, 承成 2019

机译：自动引理生成和用于重写归纳的多上下文方案
6. The List of Available Names (LAN): A new generation for stable taxonomic names in zoology? [O] . Miguel A. Alonso-Zarazaga, Daphne Gail Fautin, Ellinor Michel 2016

机译：可用名称列表（LAN）：用于在生态学中稳定分类学名称的新一代产品？
7. Multi-Context Automated Lemma Generation for Term Rewriting Induction with Divergence Detection [O] . Chengcheng JI, Masahito KURIHARA, Haruhiko SATO 2019

机译：多语境自动化物质生成，用于致病性检测的重写诱导
8. Alpha and Long-Lived beta gamma Waste Source Term. A First Generation Model for a Deep Cemented Waste Repository [R] . Lovera, P. , Mangin, J. P. , Jorda, M. , 1987

机译：alpha和Long-Lived beta gamma废物源术语。深层胶结废物库的第一代模型

Deep Generation of Coq Lemma Names Using Elaborated Terms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅