首页> 外文期刊>Parallel Computing >A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing
【24h】

A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

机译:一种用于Fortran粗粒度并行处理的使用部分静态任务分配的数据本地化编译方案

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts : (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localiz- able groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data--localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataf- low compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro--dataflow processing with the proposed data--localization scheme can reduce the execution time by 20/100, in average, compared with macro--dataflow processing without data localization.
机译:本文提出了一种在局部内存和集中式共享内存的多处理器系统上使用局部静态任务分配进行Fortran粗粒度并行处理(或宏数据流处理)的数据本地化的编译方案。数据本地化使我们可以有效地使用本地内存,并减少动态任务调度环境下的数据传输开销。所提出的编译方案主要包括以下三个部分:(1)循环对齐分解,它将每个与数据相关的循环分解为较小的循环,并将分解后的循环分组为数据可本地化的组,以便共享每组内部的分解循环之间的数据可以通过本地内存传递,并且各组之间的数据传输开销可以最小化; (2)部分静态任务分配,该信息将信息分配给宏数据流编译器中的动态调度例程生成器,该信息将每个可本地化数据组中的分解循环分配给同一处理器; (3)并行机器代码生成,它生成并行机器代码,以通过本地存储器在组内传递共享数据,并通过集中式共享存储器在组之间传输数据。此编译方案已针对多处理器系统OSCAR(最佳调度高级多进程)实现,除每个处理器上的本地内存外,OSCAR还具有集中式共享内存和分布式共享内存。 OSCAR的性能评估表明,与没有数据本地化的宏数据流处理相比,采用建议的数据本地化方案的宏数据流处理平均可减少20/100的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号