首页> 外文会议>Conference on empirical methods in natural language processing >SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications
【24h】

SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications

机译:SemRegex:从自然语言规范生成正则表达式的基于语义的方法

获取原文

摘要

Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e.. semantically equivalent programs may have many syntactically different forms. To address this issue, in this paper, we propose a semantics-based approach named SemRegex. SemRegex provides solutions for a subtask of the program-synthesis problem: generating regular expressions from natural language. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.
机译:最近的研究提出了基于语法的方法来解决从自然语言规范生成程序的问题。这些方法通常使用基于语法的目标:最大似然估计(MLE)来训练序列到序列学习模型。这样的基于语法的方法不能有效地解决生成语义上正确的程序的目标,因为这些方法无法处理程序别名,即,语义上等效的程序可能具有许多语法上不同的形式。为了解决这个问题,在本文中,我们提出了一种基于语义的方法,名为SemRegex。 SemRegex为程序综合问题的子任务提供了解决方案:从自然语言生成正则表达式。与现有的基于语法的方法不同,SemRegex通过最大化生成的正则表达式的预期语义正确性来训练模型。语义正确性是使用DFA等效性Oracle,随机测试用例和区分测试用例来衡量的。在三个公共数据集上进行的实验证明了SemRegex优于现有的最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号