首页> 外文会议> >Partitioning regular grid applications with irregular boundaries for cache-coherent multiprocessors
【24h】

Partitioning regular grid applications with irregular boundaries for cache-coherent multiprocessors

机译:对具有不规则边界的常规网格应用程序进行分区,以实现高速缓存一致性多处理器

获取原文
获取外文期刊封面目录资料

摘要

We consider the problem of partitioning applications that operate on a regular grid but have irregular boundaries for a cache-coherent multiprocessor. Domain decomposition techniques such as RSB have commonly been used to reduce interprocessor communication in message passing multiprocessors. We apply these partitioning algorithms on cache-coherent multiprocessors to reduce cache-coherency traffic. We find that the actual cache-coherency traffic is approximately double the estimated true coherency traffic, primarily due to false-sharing and the consequent false coherency traffic. We devise two techniques that eliminate false sharing traffic in partitions produced using the common domain decomposition algorithms. In our compensation algorithm, we modify the partition produced by the domain decomposition to ensure that all the nodes on a cache line are assigned to the same processor. In our coalescing algorithm, nodes belonging to the same cache line are coalesced into a single node and the weights on nodes and arcs adjusted to represent the overall computation and communication costs of the coalesced nodes. This coalesced graph is partitioned using a domain decomposition algorithm and then the coalesced nodes in the partition are expanded. Our experimental results using an Indian Ocean circulation application on the KSR1 multiprocessor demonstrate that compensation reduces coherency traffic by as much as 55% and execution time by up to 18% and that graph coalescing reduces coherency traffic by up to 74%.
机译:我们考虑对在规则网格上运行但对高速缓存一致的多处理器具有不规则边界的应用程序进行分区的问题。域分解技术(例如RSB)通常用于减少消息传递多处理器中的处理器间通信。我们将这些分区算法应用于高速缓存一致性多处理器,以减少高速缓存一致性流量。我们发现,实际的缓存一致性流量大约是估计的真实一致性流量的两倍,这主要是由于虚假共享和随之而来的虚假一致性流量所致。我们设计了两种技术来消除使用公共域分解算法产生的分区中的错误共享流量。在我们的补偿算法中,我们修改了由域分解产生的分区,以确保将高速缓存行上的所有节点都分配给同一处理器。在我们的合并算法中,将属于同一高速缓存行的节点合并为一个节点,并调整节点和弧上的权重以代表合并节点的总体计算和通信成本。使用域分解算法对该合并图进行分区,然后扩展分区中的合并节点。我们在KSR1多处理器上使用印度洋环流应用程序的实验结果表明,补偿可将相干通信量减少多达55%,将执行时间减少多达18%,并且图形合并可将相干通信量减少多达74%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号