首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture
【24h】

Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

机译:Cell BE体系结构的自动预取和模调度转换

获取原文
获取原文并翻译 | 示例
           

摘要

Ease of programming is one of the main requirements for the broad acceptance of multicore systems without hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques. Memory accesses are classified at compile time into two classes: high locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software-cache overhead in the innermost loop. The cache design enables automatic prefetch and modulo scheduling transformations. Performance evaluation indicates that optimized software-cache structures combined with the proposed prefetch techniques translate into speedup between 10 and 20 percent. As a result of the proposed technique, we can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.
机译:易于编程是广泛接受多核系统的主要要求之一,而没有硬件支持本地和全局存储器之间的透明数据传输。软件缓存是一种强大的方法,可以为用户提供内存体系结构的透明视图。但是这种软件方法可能会导致性能不佳。在本文中,我们提出了一种分层的,混合的软件缓存体系结构,其目标是启用预取技术。内存访问在编译时分为两类:高局部性和不规则性。然后,我们的方法将内存引用引向针对其各自的访问模式优化的两个特定缓存结构之一。对特定的缓存结构进行了优化,以使高级编译器优化能够积极地展开循环,重新排序缓存引用和/或变换周围的循环,从而实际上消除了最内层循环中的软件缓存开销。缓存设计可实现自动预取和模调度转换。性能评估表明,优化的软件缓存结构与建议的预取技术相结合,可将速度提高10%到20%。由于该技术的提出,我们可以在Cell BE处理器上实现与现代服务器级多核(例如针对一组并行NAS应用程序的IBM PowerPC 970MP处理器)相似的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号