首页> 外文会议>Networks-on-Chip, 2009. NoCS 2009 >A GALS many-core heterogeneous DSP platform with source-synchronous on-chip interconnection network
【24h】

A GALS many-core heterogeneous DSP platform with source-synchronous on-chip interconnection network

机译:具有源同步片上互连网络的GALS多核异构DSP平台

获取原文

摘要

This paper presents a many-core heterogeneous computational platform that employs a GALS compatible circuit-switched on-chip network. The platform targets streaming DSP and embedded applications that have a high degree of task-level parallelism among computational kernels. The test chip was fabricated in 65nm CMOS consisting of 164 simple small programmable cores, three dedicated-purpose accelerators and three shared memory modules. All processors are clocked by their own local oscillators and communication is achieved through a simple yet effective source-synchronous communication technique that allows each interconnection link between any two processors to sustain a peak throughput of one data word per cycle. A complete 802.11a WLAN baseband receiver was implemented on this platform. It has a real-time throughput of 54 Mbps with all processors running at 594 MHz and 0.95 V, and consumes an average 174.76 mW with 12.18 mW (or 7.0%) dissipated by its interconnection links. We can fully utilize the benefit of the GALS architecture and by adjusting each processor's oscillator to run at a workload-based optimal clock frequency with the chip's dual supply voltages set at 0.95 V and 0.75 V, the receiver consumes only 123.18 mW, a 29.5% in power reduction. Measured results of its power consumption on the real chip come within the difference of only 2-5% compared with the estimated results showing our design to be highly reliable and efficient.
机译:本文提出了一个采用GALS兼容电路交换片上网络的多核异构计算平台。该平台针对在计算内核之间具有高度任务级并行性的流式DSP和嵌入式应用程序。该测试芯片采用65nm CMOS制造,包括164个简单的小型可编程内核,三个专用加速器和三个共享存储模块。所有处理器均由它们自己的本地振荡器作为时钟源,并且通过简单而有效的源同步通信技术来实现通信,该技术允许任何两个处理器之间的每个互连链接维持每个周期一个数据字的峰值吞吐量。在此平台上实现了完整的802.11a WLAN基带接收器。它具有54 Mbps的实时吞吐量,所有处理器均在594 MHz和0.95 V下运行,平均功耗为174.76 mW,其中12.18 mW(或7.0%)的互连链路耗散了该功率。我们可以充分利用GALS架构的优势,并通过将每个处理器的振荡器调整为以基于工作负载的最佳时钟频率运行,同时将芯片的双电源电压设置为0.95 V和0.75 V,接收器仅消耗123.18 mW(29.5%)降低功率。与估计结果相比,其在实际芯片上的功耗测量结果相差仅2-5%,这表明我们的设计具有高度的可靠性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号