【24h】

Synthesizing benchmarks for predictive modeling

机译:综合基准​​以进行预测建模

获取原文

摘要

Predictive modeling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the quality of learned models, as they have very sparse training data for what are often high-dimensional feature spaces. What is needed is a way to generate an unbounded number of training programs that finely cover the feature space. At the same time the generated programs must be similar to the types of programs that human developers actually write, otherwise the learning will target the wrong parts of the feature space. We mine open source repositories for program fragments and apply deep learning techniques to automatically construct models for how humans write programs. We sample these models to generate an unbounded number of runnable training programs. The quality of the programs is such that even human developers struggle to distinguish our generated programs from hand-written code. We use our generator for OpenCL programs, CLgen, to automatically synthesize thousands of programs and show that learning over these improves the performance of a state of the art predictive model by 1.27x. In addition, the fine covering of the feature space automatically exposes weaknesses in the feature design which are invisible with the sparse training examples from existing benchmark suites. Correcting these weaknesses further increases performance by 4.30x.
机译:使用机器学习进行预测建模是构建编译器试探法的有效方法,但是缺乏基准。编译领域之外的典型机器学习实验训练了成千上万个示例。但是,在针对编译器的机器学习中,通常只有几十个通用基准可用。这限制了学习模型的质量,因为它们对于通常是高维特征空间的训练数据非常稀疏。所需要的是一种生成无数训练程序的方法,这些训练程序可以很好地覆盖特征空间。同时,生成的程序必须类似于人类开发人员实际编写的程序类型,否则学习将针对特征空间的错误部分。我们挖掘用于程序片段的开源资源库,并应用深度学习技术来自动构建用于人类编写程序的模型。我们对这些模型进行采样以生成无数可运行的训练程序。程序的质量是如此之高,以至于人类开发人员都难以将我们生成的程序与手写代码区分开。我们使用OpenCL程序生成器CLgen来自动合成数千个程序,并表明对这些程序的学习可以将最新的预测模型的性能提高1.27倍。此外,对特征空间的精细覆盖自动暴露了特征设计中的弱点,而这些弱点在现有基准套件的稀疏训练示例中是看不到的。纠正这些弱点后,性能进一步提高了4.30倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号