首页> 外文会议>IEEE International System-on-Chip Conference >Region based cache coherence for tiled MPSoCs
【24h】

Region based cache coherence for tiled MPSoCs

机译:平铺式MPSoC的基于区域的缓存一致性

获取原文

摘要

The need for faster and more energy efficient computing has led us to the multicore era with distributed shared memory hierarchies. The primary goal is to distribute parallel tasks onto multiple processing elements to collectively achieve shorter execution times at lower frequencies and supply voltages when compared to a single-core architecture. Major challenges of this approach are how to achieve local, low latency memory accesses and low overheads for coherence and synchronization management. We believe that enabling global coherence in tiled many-core architectures does not scale in a cost efficient manner and isn't even required for applications with limited degrees of parallelism. In this paper, we propose a novel region based cache coherence scheme, where coherence is provided by hardware directories within a flexibly sized but confined set of compute and memory tiles. We also show that data placement and task mapping have a huge impact on the application performance, and hence should be considered in conjunction with region based coherence. The approach is evaluated by means of a high level simulation model using workloads from PARSEC. Experiments demonstrate that our region based approach with multiple compute tiles increases performance by a factor of up to 2.5 compared to a single tile structure with nominally identical computing and memory resources. Thus the independent local memory accesses, which are effectively increasing the memory bandwidth, usually outweigh the penalties of inter-tile remote memory accesses. Our approach also reduces the directory structures significantly compared to traditional schemes, making it scalable for large MPSoCs (eg. by 41.4% for a 16 tile system with 4 tiles per region). Considering data-to-task-placement, our investigations show that it can lead to performance variations up to a factor of 12.7.
机译:对更快,更节能的计算的需求使我们进入了具有分布式共享内存层次结构的多核时代。与单核架构相比,主要目标是将并行任务分配到多个处理元件上,以共同在较低的频率和电源电压下实现更短的执行时间。这种方法的主要挑战是如何实现本地的低延迟内存访问以及低的一致性和同步管理开销。我们认为,在平铺的多核体系结构中实现全局一致性并不能以节省成本的方式扩展,甚至对于并行度有限的应用程序甚至都不需要。在本文中,我们提出了一种新颖的基于区域的缓存一致性方案,该一致性方案是由大小灵活但受限的一组计算和内存切片中的硬件目录提供的。我们还表明,数据放置和任务映射对应用程序性能有巨大影响,因此应结合基于区域的一致性来考虑。通过使用来自PARSEC的工作负载的高级仿真模型来评估该方法。实验表明,与具有名义上相同的计算和内存资源的单个图块结构相比,我们的基于多个计算图块的基于区域的方法将性能提高了2.5倍。因此,独立的本地内存访问有效地增加了内存带宽,通常超过了小块间远程内存访问的代价。与传统方案相比,我们的方法还显着减少了目录结构,使其可扩展到大型MPSoC(例如,对于每个区域具有4个图块的16个图块系统,其可扩展性为41.4%)。考虑到数据到任务的放置,我们的调查表明,它可能导致性能变化高达12.7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号