首页> 外文会议>International Conference on Signal Processing, Communications and Networking >Design of an interconnect topology for multi-cores and scale-out workloads
【24h】

Design of an interconnect topology for multi-cores and scale-out workloads

机译:用于多核和横向扩展工作负载的互连拓扑设计

获取原文

摘要

Scale-out workloads are applications that are typically executed in a cloud environment and exhibit high level of request level parallelism. Such workloads benefit from processor organizations with very high core count since multiple requests can be serviced simultaneously by threads running on these cores. The characteristics of these workloads indicate that they have high instruction footprints exceeding the capacities of private caches, operate on large datasets with limited reuse and have minimal coherence activity due to lesser data sharing. The characteristics also indicate that the active instruction window can be captured by a Last Level Cache (LLC) size of 8MB. New processor organizations have been discussed in literature that tailors the interconnection among cores to match the communication pattern arising out of the characteristics of scale-out workloads. The focus of the current work is to take the approach of separating a core and LLC bank from a single tile as specified in literature and design a different interconnection topology for cores and LLC banks to reduce the latency of accessing the LLC to improve performance. In the current work, four cores and a LLC bank are designed to connect to a router forming a star topology and the routers (>4) are designed to form a 2D flattened butterfly topology. The current design has been targeted at 8 cores and has been implemented using the Bluespec System Verilog HDL (Hardware Description Language) and the design has been synthesized using Xilinx Vivado 2013.2 targeting Zynq-7000 product family of FPGA boards. The design has been evaluated for different amounts of offered traffic and the average latency and the throughput of the interconnection network for uniform random traffic pattern has been calculated. An injection rate of 0.05packets/cycle/core which corresponds to the maximum L2 miss rate for scale-out workloads gives an average packet latency of 29.5 clock cycles and a through- ut of 0.52packets/cycle.
机译:横向扩展工作负载是通常在云环境中执行的应用程序,表现出高水平的请求级别并行性。此类工作量得益于核心数量非常高的处理器组织,因为可以通过这些核心上运行的线程同时处理多个请求。这些工作负载的特征表明,它们的指令占用空间超过了专用缓存的容量,对大型数据集进行了有限的重用,并且由于较少的数据共享而具有最小的一致性活动。该特征还指示可以通过8MB的最后一级缓存(LLC)大小捕获活动指令窗口。在文献中已经讨论了新的处理器组织,该组织对内核之间的互连进行了调整,以匹配因横向扩展工作负载的特性而产生的通信模式。当前工作的重点是采用文献中指定的从单个磁贴中分离核心和LLC组的方法,并为核心和LLC组设计不同的互连拓扑,以减少访问LLC的等待时间以提高性能。在当前工作中,四个内核和一个LLC组被设计为连接到形成星形拓扑的路由器,而路由器(> 4)被设计为形成2D扁平蝶形拓扑。当前的设计针对8个内核,并已使用Bluespec系统Verilog HDL(硬件描述语言)实现,并且该设计已针对Xiynx Vivado 2013.2(针对Zynq-7000 FPGA板产品系列)进行了综合。已针对提供的通信量的不同对设计进行了评估,并针对均匀随机通信量模式计算了互连网络的平均等待时间和吞吐量。 0.05包/周期/核心的注入速率对应于横向扩展工作负载的最大L2丢失率,平均包延迟为29.5个时钟周期,吞吐量为0.52包/周期。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号