首页> 外文学位 >Program Synthesis for Empowering End Users and Stress-Testing Compilers.
【24h】

Program Synthesis for Empowering End Users and Stress-Testing Compilers.

机译:用于增强最终用户和压力测试编译器的程序综合。

获取原文
获取原文并翻译 | 示例

摘要

Since 2012, the number of people with access to computing devices has exploded. This trend is due to these devices' falling prices, and their increased functionalities and programmability. Two challenges arise from this trend: (1) How to assist the increasing number of device owners, most of whom are non-programmers, yet hope to take full advantage of their devices, and (2) how to make critical software, which is relied upon by other software running on these devices, more reliable? Our vision is that program synthesis is the solution for the above arising challenges. This dissertation describes various program synthesis techniques to synthesize programs from natural languages, input/output examples, and existing programs to tackle these challenges.;We present SmartSynth, a natural language interface that synthesizes smartphone programs from natural language descriptions. SmartSynth enables users to automate their repetitive tasks by leveraging various phone sensors. It is the first system that combines the advances in both natural language processing (NLP) and program synthesis: it uses NLP techniques to parse a given command, and applies program synthesis techniques to resolve parsing ambiguities. We have adapted and extended SmartSynth's algorithms to integrate it into TouchDevelop, a popular touch-based programming environments for end users.;We develop FlashExtract, a system that extracts data from semi-structured document (such as text files, webpages, and spreadsheets) using examples. Natural language does not work well in this domain because the tasks are complicated and users usually do not know how to perform them. In FlashExtract, users only need to highlight some sample regions to be extracted, the system will learn a program to select similar regions automatically. They can also provide nested examples to extract structured, hierarchical data. FlashExtract has been shipped as the cmdlet ConvertFrom-String of PowerShell in Microsoft Windows 10 and as the Custom Fields feature in Microsoft Azure Operational Insights.;While it is important to make programming accessible to end users, it is also vital to improve the correctness of critical software because they impact other software products. We introduce Equivalence Modulo Inputs (EMI), a novel, general methodology to synthesize valid compiler test programs from existing test cases. Given a test program and a set of its inputs, we profile the program's execution over the inputs. We then generate new test variants by randomly removing code that is unexecuted under the provided inputs. The expectation is that these new variants should behave exactly the same as the original program under the same inputs; any observed discrepancy indicates a compiler bug. Our technique is simple to realize yet very effective. In total, we have reported more than 400 bugs in GCC and LLVM, most of which have already been fixed. Many compiler vendors have adopted EMI to test their compilers.;We also believe that cross-disciplinary solutions can increase the impact of program synthesis techniques. We therefore further explore program synthesis in the following three dimensions. First, we introduce two novel user interaction models to help users resolve ambiguities in programming-by-example systems like FlashExtract. One model allows users to effectively navigate among the large set of programs consistent with the provided examples. The other model uses active learning to ask questions to help differentiate the outputs of these programs. Second, we present a guided, advanced mutation strategy based on Bayesian optimization to synthesize EMI variants more effectively. Our improved technique supports both code deletions and insertions in the unexecuted regions, and uses Markov Chain Monte Carlo (MCMC) optimization to guide the synthesis process. Finally, we apply program synthesis to find bugs in a new domain: the link-time optimizer components in compilers.
机译:自2012年以来,使用计算设备的人数激增。这种趋势是由于这些设备的价格下降,以及其功能和可编程性的提高。这种趋势带来了两个挑战:(1)如何为越来越多的设备所有者提供帮助,其中大多数是非程序员,但他们希望充分利用他们的设备;(2)如何制作关键软件,这是这些设备上运行的其他软件所依赖,是否更可靠?我们的愿景是,程序综合是上述挑战的解决方案。本文介绍了多种程序合成技术,可以从自然语言中合成程序,输入/输出示例,以及可以解决这些挑战的现有程序。我们提供了一种自然语言接口SmartSynth,它可以从自然语言描述中合成智能手机程序。 SmartSynth使用户能够利用各种电话传感器自动执行重复性任务。这是第一个结合了自然语言处理(NLP)和程序合成方面的先进技术的系统:它使用NLP技术来解析给定命令,并应用程序合成技术来解决解析歧义。我们调整并扩展了SmartSynth的算法,以将其集成到最终用户流行的基于触摸的编程环境TouchDevelop中。我们开发了FlashExtract,该系统可从半结构化文档(例如文本文件,网页和电子表格)中提取数据使用示例。自然语言在该领域不能很好地工作,因为任务很复杂并且用户通常不知道如何执行它们。在FlashExtract中,用户只需突出显示要提取的一些样本区域,系统就会学习一个程序以自动选择相似的区域。它们还可以提供嵌套示例,以提取结构化的分层数据。 FlashExtract已作为Microsoft Windows 10中PowerShell的cmdlet ConvertFrom-String以及Microsoft Azure Operational Insights中的“自定义字段”功能提供。;尽管使最终用户可以访问程序很重要,但提高程序的正确性也至关重要。关键软件,因为它们会影响其他软件产品。我们介绍了等效模输入(EMI),这是一种新颖的通用方法,可以从现有测试用例中合成有效的编译器测试程序。给定一个测试程序及其一组输入,我们通过输入来分析程序的执行情况。然后,我们通过随机删除在提供的输入下未执行的代码来生成新的测试变体。期望这些新变体在相同输入下的行为应与原始程序完全相同。任何观察到的差异都表明编译器错误。我们的技术很容易实现,但非常有效。总计,我们已经报告了GCC和LLVM中的400多个错误,其中大多数已得到修复。许多编译器供应商已采用EMI对其编译器进行测试。我们还相信,跨学科的解决方案会增加程序综合技术的影响。因此,我们将在以下三个方面进一步探索程序综合。首先,我们介绍了两种新颖的用户交互模型,以帮助用户解决FlashExtract这样的按示例编程系统中的歧义。一个模型允许用户在与所提供示例一致的大量程序之间有效地导航。另一个模型使用主动学习来提问,以帮助区分这些程序的输出。其次,我们提出了一种基于贝叶斯优化的指导性高级突变策略,可以更有效地合成EMI变体。我们改进的技术支持未执行区域中的代码删除和插入,并使用马尔可夫链蒙特卡洛(MCMC)优化来指导合成过程。最后,我们应用程序合成来查找新领域的错误:编译器中的链接时优化器组件。

著录项

  • 作者

    Le, Vu Minh.;

  • 作者单位

    University of California, Davis.;

  • 授予单位 University of California, Davis.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 199 p.
  • 总页数 199
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号