首页> 外文会议>Parallel and Distributed Computing, Applications and Technologies, 2009 >Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications
【24h】

Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications

机译:用于平衡并行科学应用的片上多处理器缓存分区

获取原文

摘要

Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning (SLCP), Time-level Cache Partitioning (TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.
机译:如今,越来越多的超级计算机建立在具有共享缓存的多核处理器上。但是,从不同线程或进程对共享缓存的冲突访问成为并行应用程序的性能瓶颈。缓存分区可用于专门根据进程的需求为不同的进程分配缓存资源。通过将缓存访问限制为共享缓存的不同私有部分,可以避免发生冲突的访问。本文研究了CMP体系结构中平衡MPI并行应用程序的共享高速缓存分区问题,提出了面向性能的高速缓存分区框架,包括空间级高速缓存分区(SLCP),时间级高速缓存分区(TLCP)以及它们的评估。我们基于四核模拟器评估SLCP和TLCP。实验表明,在IPC吞吐量和未命中率指标方面,SLCP和TLCP优于传统的LRU缓存替换策略。具体来说,对于大型工作负载,TLCP的性能要比LRU高出20%,平均为8.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号