首页> 外文期刊>Microprocessors and microsystems >Scalable embedded computing through reconfigurable hardware: Comparing DF-Threads, cilk, openmpi and jump
【24h】

Scalable embedded computing through reconfigurable hardware: Comparing DF-Threads, cilk, openmpi and jump

机译:通过可重新配置的硬件进行可扩展的嵌入式计算:比较DF-Threads,cilk,openmpi和jump

获取原文
获取原文并翻译 | 示例
           

摘要

Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the workload across several cores (in a multi-core) and several nodes (in a multi-node/multi-board configuration).In this paper, the advance in deploying this execution model is shown while developing it by using a combination of a simulator model (i.e., the COTSon framework) and a reconfigurable hardware platform (i.e., the AXIOM-board). The AXIOM platform consists of a custom board based on the Xilinx Zynq Ultrascale+ ZU9EG, which incorporates the largest FPGA available on that System-on-Chip at the moment, four 64-bit ARM cores and two 32-bit ARM cores, up to 32GiB of main memory and several 16Gbit/s transceivers.While a complete DF-Threads system is still under development, but is already capable of running a full Linux OS and simple applications, so some initial results are presented here. In particular, well-known programming models that are used to exploit the Thread-Level Parallelism such as Cilk, OpenMPI and Jump are compared with DF-thread execution. Cilk is good for multi-cores, but it is not suitable for multi-nodes systems. In the latter cases, the distribution of the workload could be managed partly by the programmer when using programming models such as message-passing (OpenMPI has been chosen for reference) or distributed shared-memory (Jump in our case).The obtained results show that a DF-Thread execution on a cluster of eight 4-core boards can provide a speed-up of more than 14x compared to the same configuration when using OpenMPI and more than 80x when compared with a OpenMPI single core, single node execution.
机译:数据流线程(DF-Threads)是一种新的执行模型,它允许跨多个内核(在多核中)和多个节点(在多节点/多板配置中)无缝地分配工作负载。 ,在结合使用模拟器模型(即COTSon框架)和可重新配置的硬件平台(即AXIOM板)进行开发时,显示了部署此执行模型的进展。 AXIOM平台包括一个基于Xilinx Zynq Ultrascale + ZU9EG的定制板,该定制板集成了该片上系统上目前可用的最大FPGA,四个64位ARM内核和两个32位ARM内核(最高32GiB)。完整的DF-Threads系统仍在开发中,但已经能够运行完整的Linux OS和简单的应用程序,因此在此给出了一些初步结果。特别是,将用于开发线程级并行性的著名编程模型(例如Cilk,OpenMPI和Jump)与DF线程执行进行了比较。 Cilk适用于多核,但不适用于多节点系统。在后一种情况下,当使用消息传递(已选择OpenMPI作为参考)或分布式共享内存(在我们的情况下为Jump)等编程模型时,程序员可以部分地管理工作负载的分配。与使用OpenMPI的相同配置相比,在八个8核板的群集上执行DF-Thread可以提供14倍以上的速度,而与OpenMPI单核,单节点执行相比,可以将速度提高超过80倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号