首页> 外文会议>2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications >An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP
【24h】

An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP

机译:基于OpenMP的自动并行解耦软件流水线并行化算法

获取原文
获取原文并翻译 | 示例

摘要

While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.
机译:尽管多核处理器提高了多程序和多线程代码的吞吐量,但是许多重要的应用程序都是单线程的,因此没有好处。自动并行化技术在将单线程应用程序迁移到多核平台中起着重要作用。不幸的是,普通程序中普遍存在控制流,递归数据结构和通用指针访问,这使得传统的自动并行化技术不合适。提出了并行级解耦软件流水线(PS-DSWP),以利用普通程序中潜伏的细粒度流水线并行性,在指令级别上存在各种依赖关系,包括任意控制依赖关系。但是,它需要有关通信通道的体系结构属性和硬件支持的知识以及两个特殊说明。本文提出了一种基于OpenMP的改进PS-DSWP算法。通过使用高级中间表示,可以在不依赖CPU架构的情况下实现它。此外,该算法中使用的程序依赖图(PDG)是基于基本块构建的,与基于指令的带有PDG的原始PS-DSWP转换相比,它利用了更粗糙的并行性。在我们的算法中使用OpenMP来分配任务并实现线程之间的同步,同时避免依赖硬件支持。我们在多核平台上用复杂的内存模式和控制流评估循环,而这是传统技术无法处理的。结果,它们可以并行化,并通过我们的算法获得显着的性能改进。我们通过5个线程获得了高达2.07倍的平均加速比和平均1.39倍的加速比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号