首页> 外文会议>Design, Automation & Test in Europe Conference and Exhibition >Time-critical computing on a single-chip massively parallel processor
【24h】

Time-critical computing on a single-chip massively parallel processor

机译:单芯片大规模并行处理器上的时间关键型计算

获取原文

摘要

The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® −256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.
机译:通过在可能大量的可编程内核上并行执行应用程序,可以满足低功耗高性能计算的要求。但是,缺少精确的计时属性可能会阻止并行执行应用于时间紧迫的应用程序。我们通过适当设计KalrayMPPA®-256单芯片多核处理器的体系结构,实现和编程模型来说明如何解决此问题。 MPPA®-256(多功能处理阵列)处理器在单个28nm CMOS芯片上集成了256个处理引擎(PE)内核和32个资源管理(RM)内核。这些VLIW内核分布在16个计算群集和4个I / O子系统中,每个子系统都具有本地共享的内存。片上通信和同步由显式寻址的双片上网络(NoC)支持,每个计算集群一个节点,每个I / O子系统4个节点。片外接口包括DDR,PCI和以太网,以及对NoC的直接访问以进行低延迟的数据流处理。支持时间紧迫的应用程序的关键体系结构特征包括时序组成内核,计算集群内部的独立存储库以及数据NoC,其保证服务由网络演算确定。编程模型提供了可有效支持分布式计算原语的通信器,例如远程写入,屏障同步,活动消息以及通过采样进行的通信。 POSIX时间函数公开了计算集群内部的同步时钟以及MPPA®-256处理器之间的中同步时钟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号