Integrating support for block data transfer has become an important emphasis in recent cache-coherent shared address space multiprocessors. This paper examines the potential performance benefits of adding this support. A set of ambitious hardware mechanisms is used to study performance gains in five important scientific computations that appear to be good candidates for using block transfer. Our conclusion is that the benefits of block transfer are not substantial for hardware cache-coherent multiprocessors. The main reasons for this are (i) the relatively modest fraction of time applications spend in communication amenable to block transfer, (ii) the difficulty of finding enough independent computation to overlap with the communication latency that remains after block transfer, and (iii) long cache lines often capture many of the benefits of block transfer in efficient cache-coherent machines. In the cases where block transfer improves performance, prefetching can often provide comparable, ifnot superior, performance benefits. We also examine the impact of varying important communication parameters and processor speed on the effectiveness of block transfer, and comment on useful features that a block transfer facility should support for real applications.
机译:高速缓存相干多处理器的高性能FFT算法
机译:LIGERO:一种用于高速缓存一致性芯片多处理器的轻便高效路由器
机译:CCNoC:用于芯片多处理器的高速缓存一致性片上网络
机译:在并行应用程序中集成无阻塞同步:性能优势和方法
机译:具有不同互连拓扑的基于群集的缓存一致性多处理器系统中的内存延迟评估。
机译:基于IOT的岩土监测系统的优点集成了数据采集和阐述的自动程序
机译:在缓存一致性多处理器中集成块数据传输的性能优势