...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Using processor affinity in loop scheduling on shared-memory multiprocessors
【24h】

Using processor affinity in loop scheduling on shared-memory multiprocessors

机译:在共享内存多处理器上的循环调度中使用处理器相似性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.
机译:在许多应用程序中,循环是最大的并行性来源。利用这种并行性的一种方法是在不同的处理器上并行执行循环迭代。以前的循环调度方法试图通过尽可能均匀地分配工作负载,同时最大程度地减少所需的同步操作次数来实现最短的完成时间。作者考虑了共享内存多处理器上的循环调度问题的第三个维度:由于访问非本地数据而导致的通信开销。他们表明,用于循环调度的传统算法在将迭代分配给处理器时会忽略数据的位置,这会给现代共享内存多处理器带来明显的性能损失。他们提出了一种新的循环调度算法,该算法试图同时平衡工作负载,最小化同步,并将循环迭代与必要的数据共存。他们通过在Silicon Graphics多处理器工作站,BBN Butterfly,Sequent Symmetry和KSR-1上使用五个有代表性的内核程序,将该新算法的性能与其他已知算法进行了比较,并证明该新算法在性能上有实质性的改进,在某些情况下最高可达4倍。作者得出的结论是,针对共享内存多处理器的循环调度算法不能忽视数据的位置,特别是考虑到处理器和内存速度之间的差距越来越大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号