首页> 外文期刊>Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on >Scalability Analysis of Memory Consistency Models in NoC-Based Distributed Shared Memory SoCs
【24h】

Scalability Analysis of Memory Consistency Models in NoC-Based Distributed Shared Memory SoCs

机译:基于NoC的分布式共享内存SoC中内存一致性模型的可伸缩性分析

获取原文
获取原文并翻译 | 示例

摘要

We analyze the scalability of six memory consistency models in network-on-chip (NoC)-based distributed shared memory multicore systems: 1) protected release consistency (PRC); 2) release consistency (RC); 3) weak consistency (WC); 4) partial store ordering (PSO); 5) total store ordering (TSO); and 6) sequential consistency (SC). Their realizations are based on a transaction counter and an address-stack-based approach. The scalability analysis is based on different workloads mapped on various sizes of networks using different problem sizes. For the experiments, we use Nostrum NoC-based configurable multicore platform with a 2-D mesh topology and a deflection routing algorithm. Under the synthetic workloads, the average execution time for the PRC, RC, WC, PSO, and TSO models in the 8$,times,$8 network (64-cores) is reduced by 32.3%, 28.3%, 20.1%, 13.8%, and 9.9% over the SC model, respectively. For the application workloads, as the network size grows, the average execution time under these relaxed memory models decreases with respect to the SC model depending on the application and its match to the architecture. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up. The area cost in the network interface for the relaxed memory models is increased by less than 4% over the SC model.
机译:我们分析了基于片上网络(NoC)的分布式共享内存多核系统中六个内存一致性模型的可伸缩性:1)受保护的发布一致性(PRC); 2)发布一致性(RC); 3)弱一致性(WC); 4)部分商店订购(PSO); 5)总商店订购(TSO); 6)顺序一致性(SC)。它们的实现基于事务计数器和基于地址栈的方法。可伸缩性分析基于使用不同问题大小映射到各种大小的网络上的不同工作负载。对于实验,我们使用基于Nostrum NoC的可配置多核平台,该平台具有2-D网格拓扑和偏转路由算法。在综合工作负载下,PRC,RC,WC,PSO和TSO模型在8美元x 8美元网络(64核)中的平均执行时间减少了32.3%,28.3%,20.1%,13.8% ,分别比SC模型高9.9%。对于应用程序工作负载,随着网络规模的增长,这些宽松的内存模型下的平均执行时间相对于SC模型而言要短一些,具体取决于应用程序及其与体系结构的匹配情况。如实验观察到的,当系统进一步扩大规模时,PRC和RC模型相对于SC模型的性能改进趋向于高于50%。与SC模型相比,宽松内存模型在网络接口中的区域成本增加了不到4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号