【24h】

Compiling for stream processing

机译:编译流处理

获取原文

摘要

This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses information about the program structure and estimates of kernel and memory operation execution times to overlap kernel execution with memory transfers, maximizing performance, and to optimize use of scarce on-chip memory, significantly reducing external memory bandwidth. Our compiler applies optimizations such as strip-mining, loop unrolling, and software pipelining, at the level of kernels and stream memory operations. We evaluate the performance of our compiler on a suite of media and scientific benchmarks. Our results show that compiler management of on-chip storage reduces external memory bandwidth by 35% to 93% and reduces execution time by 23% to 72% compared to cachelike LRU management of the same storage. We show that strip-mining stream applications enables producer-consumer locality to be captured in on-chip storage reducing external bandwidth by 50% to 80%. We also evaluate the sensitivity of performance to the scheduling methods used and to critical resources. Overall, our compiler is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.
机译:本文介绍了用于流程程序的编译器,其有效地安排计算内核和流存储操作,并分配片上存储。我们的编译器使用有关程序结构的信息和内核和内存操作执行时间的估算,以将内核执行与内存传输,最大化性能最大化,并优化使用稀缺的片上存储器,显着降低外部存储器带宽。我们的编译器在内核和流内存操作的级别应用诸如STREM-MINING,LOOP展开和软件流水线之类的优化。我们在媒体和科学基准套件上评估我们的编译器的表现。我们的结果表明,与同一存储的简易机器人管理相比,片上存储的编译器管理将外部内存带宽降低35%至93%,并将执行时间减少23%至72%。我们表明,挖掘挖掘流应用程序使得生产者 - 消费者局部能够在片上存储中捕获,将外部带宽降低50%至80%。我们还评估对使用和关键资源的调度方法的敏感性。总的来说,我们的编译器能够重叠内存操作并管理本地存储,以便在运行计算内核中花费78%到96%的程序执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号