首页> 外文会议>Hawaii international conference on system science;HICSS-31 >Effects of parallelism degree on run-time parallelization of loops
【24h】

Effects of parallelism degree on run-time parallelization of loops

机译:并行度对循环运行时并行化的影响

获取原文

摘要

Due to the overhead for exploiting and managing parallelism,run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance.In this paper,we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic cross-iteration dependences.The DOALL approach exploits iteration-level parallelism.It restructures the loop into a sequence of do-parallel loops,separated by barrier operations.It erations of a do-parallel loop are run in parallel.By contrast,the DOACROSS approach exposes fine-grained reference-level parallelism,It allows dependent iterations to be run concurrently by inserting point-to point synchronization operations to preserve dependences.The DOACROSS approach has variants that identify different amounts of parallelism among consecutive reads to the same memory location.We evaluate the algorithms for loops using various structures,memory access patterns,and computational workloads on symmetric multiprocessors.The algorithms are scheduled using block cyclic decomposition strategies.The experimental results show that the DOACROSS technique out performs the DOALL,even though the latter is widely used in compile-time parallelization of loops.Of the DOACROSS variants,the algorithm allowing partially concurrent reads performs best because it incurs only slightly more overhead than the algorithm disallowing concurrent reads.The benefit from allowing fully concurrent reads is significant for small loops that do not have enough parallelism.However,it is likely to be outweighed by its cost for large loops or loops with light workload.
机译:由于开发和管理并行性的开销很大,因此,以最大化并行性为目标的运行时循环并行化技术不一定会导致最佳性能。本文中,我们提出了两种并行化技术,它们为动态循环使用了不同程度的并行度DOALL方法利用迭代级并行性,将循环重组为一系列do-parallel循环,并由势垒操作分隔.do-parallel循环的迭代并行运行。该方法公开了细粒度的参考级并行性,它允许通过插入点对点同步操作来并行运行相关的迭代,以保留相关性。DOACROSS方法具有多种变体,可识别对同一内存位置的连续读取之间的并行性量。我们使用各种结构,内存访问模式和计算w来评估循环算法算法采用块循环分解策略进行调度。实验结果表明,尽管DOACROSS技术被广泛用于循环的编译时并行化,但是DOACROSS技术仍然可以执行DOALL。在DOACROSS变体中,该算法允许部分并发读取的性能最佳,因为与不允许并发读取的算法相比,它只会产生稍多的开销。对于没有足够并行性的小循环,允许完全并发读取的好处非常重要。适用于大型循环或工作量较小的循环。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号