首页> 外文期刊>Journal of Parallel and Distributed Computing >Topology and computational-power aware tile mapping of perfectly nested loops with dependencies on distributed systems
【24h】

Topology and computational-power aware tile mapping of perfectly nested loops with dependencies on distributed systems

机译:依赖于分布式系统的完美嵌套循环的拓扑和计算能力感知切片映射

获取原文
获取原文并翻译 | 示例

摘要

Nested loops are main source of the parallelism in many scientific applications. Partitioning the iteration space of nested loops with data dependencies into tiles and assigning them to processing nodes for parallel execution is essential for achieving high performance. Although most of the previous work focused on tiling on fully connected homogeneous distributed systems, some studies have been devoted to tiling on partially connected distributed systems. In this paper, we address the parallelization of perfectly nested loops with dependencies on partially connected heterogeneous distributed systems and present a topology and computational-power aware tile mapping. This work aims to take into account not only the node's computational power when tiling iteration space of nested loops but also the exploitation of the network topology when mapping tiles to processing nodes. This approach allows minimizing the parallel execution time by improving the load balancing and minimizing the communication costs. We demonstrate the performance of proposed method by comparing it with the computational-power aware tile mapping and the topology aware tile mapping. The experimental results show that the proposed method improves the parallel execution time by up to 62% and 28% compared with the computational-power aware tile mapping and the topology aware tile mapping, respectively. (C) 2019 Elsevier Inc. All rights reserved.
机译:嵌套循环是许多科学应用中并行性的主要来源。将具有数据依赖性的嵌套循环的迭代空间划分为图块,并将它们分配给处理节点以进行并行执行对于实现高性能至关重要。尽管先前的大多数工作都集中在完全连接的同构分布式系统上,但一些研究仍致力于在部分连接的分布式系统上进行平铺。在本文中,我们解决了依赖部分连接的异构分布式系统的完全嵌套循环的并行化问题,并提出了一种拓扑结构和具有计算能力的图块映射。这项工作旨在不仅在平铺嵌套循环的迭代空间时考虑节点的计算能力,而且在将图块映射到处理节点时也要考虑对网络拓扑的利用。这种方法允许通过改善负载平衡和最小化通信成本来最小化并行执行时间。通过将其与计算能力感知的瓷砖映射和拓扑感知的瓷砖映射进行比较,我们证明了该方法的性能。实验结果表明,与计算能力感知图块映射和拓扑感知图块映射相比,该方法将并行执行时间分别提高了62%和28%。 (C)2019 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号