首页> 外文会议>International Joint Conference on Automated Reasoning >Deep Generation of Coq Lemma Names Using Elaborated Terms
【24h】

Deep Generation of Coq Lemma Names Using Elaborated Terms

机译:使用精心制作的术语深层生成Coq引理名称

获取原文
获取外文期刊封面目录资料

摘要

Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification projects based on proof assistants, such as Coq, increase in importance as projects grow in size and scope. While conventions can be documented and enforced manually at high cost, emerging approaches automatically learn and suggest idiomatic names in Java-like languages by applying statistical language models on large code corpora. However, due to its powerful language extension facilities and fusion of type checking and computation, Coq is a challenging target for automated learning techniques. We present novel generation models for learning and suggesting lemma names for Coq projects. Our models, based on multi-input neural networks, are the first to leverage syntactic and semantic information from Coq's lexer (tokens in lemma statements), parser (syntax trees), and kernel (elaborated terms) for naming; the key insight is that learning from elaborated terms can substantially boost model performance. We implemented our models in a toolchain, dubbed Roosterize, and applied it on a large corpus of code derived from the Mathematical Components family of projects, known for its stringent coding conventions. Our results show that Roosterize substantially outperforms baselines for suggesting lemma names, highlighting the importance of using multi-input models and elaborated terms.
机译:为使开发人员有效地理解,查看和修改大型软件项目中的源代码,必须使用命名,间距和其他基本样式属性的编码约定。随着项目规模和范围的扩大,基于证明助手(例如Coq)的验证项目中的一致约定的重要性越来越高。尽管可以以高成本手动记录和实施约定,但新兴的方法是通过在大型代码库上应用统计语言模型来自动学习和建议类似Java语言的惯用名称。但是,由于其强大的语言扩展功能以及类型检查和计算的融合,Coq是自动化学习技​​术的一个极具挑战性的目标。我们提出了新颖的生成模型,用于学习和建议Coq项目的引理名称。我们基于多输入神经网络的模型是第一个利用来自Coq的词法分析器(引理语句中的令牌),解析器(语法树)和内核(详细术语)的命名的语法和语义信息的公司。关键的见解是,从详尽的术语中学习可以大大提高模型的性能。我们在名为Roosterize的工具链中实现了我们的模型,并将其应用于源自Mathematical Components系列项目的大量代码集,这些项目以其严格的编码约定而闻名。我们的结果表明,对于建议引理名称,“公鸡化”实质上优于基线,突出了使用多输入模型和精心设计的术语的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号