首页> 外文会议>International conference on embedded computer systems: architectures, modeling and simulation >CoD: Coherence-on-Demand — Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures
【24h】

CoD: Coherence-on-Demand — Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures

机译:CoD:按需一致性—基于DSM的Manycore体系结构的运行时适应性工作集一致性

获取原文

摘要

Embedded system applications, with their inherently limited parallelism, rarely exploit all available processing resources in large DSM-based manycore architectures. In addition, global coherence spanning across all tiles does not scale well. Therefore, we have proposed a region-based cache coherence (RBCC) approach that enables coherence among a selectable cluster of tiles in accordance with application requirements. In this paper, we present a novel RBCC-malloc() extension that transparently tailors coherence to actually shared application working sets at runtime. Further, the design and hardware implementation of a flexibly configurable coherency region manager (CRM) supporting RBCC-malloc() are introduced. We synthesized the CRM on an FPGA for a 64-core system and observed a 57% reduction in BRAM-utilization compared to a global coherence directory for regions with up to 16 cores. Experiments reveal an application acceleration of up to 42% compared to a message passing based implementation. We also demonstrate the advantage of RBCC-malloc() compared to standalone RBCC.
机译:嵌入式系统应用程序固有的并行性有限,很少利用大型的基于DSM的多核体系结构中的所有可用处理资源。此外,跨所有图块的全局一致性无法很好地扩展。因此,我们提出了一种基于区域的缓存一致性(RBCC)方法,该方法可以根据应用程序要求在可选的图块群集之间实现一致性。在本文中,我们提出了一个新颖的RBCC-malloc()扩展,可以透明地在运行时将一致性调整为实际共享的应用程序工作集。此外,介绍了支持RBCC-malloc()的可灵活配置的一致性区域管理器(CRM)的设计和硬件实现。我们在64核系统的FPGA上综合了CRM,并发现与多达16个核的区域的全局一致性目录相比,BRAM利用率降低了57%。实验表明,与基于消息传递的实现相比,应用程序的加速高达42%。与独立的RBCC相比,我们还展示了RBCC-malloc()的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号