Compiling for stream processing

机译：编译流处理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses information about the program structure and estimates of kernel and memory operation execution times to overlap kernel execution with memory transfers, maximizing performance, and to optimize use of scarce on-chip memory, significantly reducing external memory bandwidth. Our compiler applies optimizations such as strip-mining, loop unrolling, and software pipelining, at the level of kernels and stream memory operations. We evaluate the performance of our compiler on a suite of media and scientific benchmarks. Our results show that compiler management of on-chip storage reduces external memory bandwidth by 35% to 93% and reduces execution time by 23% to 72% compared to cachelike LRU management of the same storage. We show that strip-mining stream applications enables producer-consumer locality to be captured in on-chip storage reducing external bandwidth by 50% to 80%. We also evaluate the sensitivity of performance to the scheduling methods used and to critical resources. Overall, our compiler is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.

机译：本文介绍了用于流程程序的编译器，其有效地安排计算内核和流存储操作，并分配片上存储。我们的编译器使用有关程序结构的信息和内核和内存操作执行时间的估算，以将内核执行与内存传输，最大化性能最大化，并优化使用稀缺的片上存储器，显着降低外部存储器带宽。我们的编译器在内核和流内存操作的级别应用诸如STREM-MINING，LOOP展开和软件流水线之类的优化。我们在媒体和科学基准套件上评估我们的编译器的表现。我们的结果表明，与同一存储的简易机器人管理相比，片上存储的编译器管理将外部内存带宽降低35％至93％，并将执行时间减少23％至72％。我们表明，挖掘挖掘流应用程序使得生产者 - 消费者局部能够在片上存储中捕获，将外部带宽降低50％至80％。我们还评估对使用和关键资源的调度方法的敏感性。总的来说，我们的编译器能够重叠内存操作并管理本地存储，以便在运行计算内核中花费78％到96％的程序执行时间。

著录项

来源
《International conference on Parallel architectures and compilation techniques》|2006年||共10页
会议地点
作者
Abhishek Das; William J. Dally; Peter Mattson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP332.11;
关键词
task level parallelism;

机译：任务等级行性;
入库时间 2022-08-20 19:42:27

相似文献

外文文献
中文文献
专利

1. Fei Teng 64 Stream Processing System: Architecture, Compiler, and Programming [J] . Xuejun Yang, Xiaobo Yan, Zuocheng Xing, Parallel and Distributed Systems, IEEE Transactions on . 2009,第8期

机译：飞腾64流处理系统：体系结构，编译器和编程
2. An approach to instruction set compiled simulator development based on a target processor C compiler back-end design [J] . Miodrag Djukic, Nenad Cetic, Radovan Obradovic, Innovations in Systems and Software Engineering . 2013,第3期

机译：一种基于目标处理器C编译器后端设计的指令集编译模拟器开发方法
3. Optimized Compiler for Intel~R Itanium~R Processor Family and Compiler Enhancements from NEC [J] . Shoichi SAKON, Hideki YAMAMOTO, Kazuhiro KUSANO, NEC Research & Development . 2003,第1期

机译：针对Intel〜R Itanium〜R处理器家族的优化编译器以及NEC对编译器的增强
4. The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors [C] . Paul M. Carpenter, Alex Ramirez, Eduard Ayguade IEEE international conference on systems, architectures, modeling and simulation;Transactions on high-performance embedded architectures and compilers;International workshop on systems, architectures, modeling and simulation . 2009

机译：抽象流机：异构多处理器上流程序的编译时性能建模
5. Compiling Stream Applications for Heterogeneous Architectures. [D] . Hormati, Amir H. 2011

机译：为异构体系结构编译流应用程序。
6. Streaming MASSIF: Cascading Reasoning for Efficient Processing of IoT Data Streams [O] . Pieter Bonte, Riccardo Tommasini, Emanuele Della Valle, 2018

机译：流式MASSIF：物联网数据流高效处理的级联推理
7. Compiling for stream processing [O] . Abhishek Das, William J. Dally, Peter Mattson 2006

机译：编译流处理

Compiling for stream processing

摘要

著录项

相似文献

相关主题

期刊订阅