首页> 外文会议>International Workshop on Intelligent Solutions in Embedded Systems >Load balancing, broadcast, and scatter primitives for efficient multicore applications
【24h】

Load balancing, broadcast, and scatter primitives for efficient multicore applications

机译:用于高效多核应用的负载平衡,广播和散射原语

获取原文

摘要

Efficient parallel execution of scientific and transaction-oriented applications requires reducing communication/synchronization overheads by improving locality using explicit methods that capturet underlying access patterns. In this work, we propose low-cost hardware that supports load balancing and parallel broadcast/scatter macro-operations. We evaluate these primitives using a cycle-accurate SystemC virtual platform of a multicore System-on-Chip (SoC) that interconnects cycle-accurate processor models (Cortex-A9) and a memory hierarchy via a hypercube Network-on-Chip (NoC). Results from executing a typical parallel matrix multiplication benchmark on a small-range embedded multicore SoC, indicate average execution time improvements of 25% for load balancing, 21% for broadcast/scatter primitives and 50% collectively, when utilizing both primitives. While load balancing relies only on remote shared-memory access principles, synthesis on Zedboard's Zynq 7020 FPGA indicates a very low area cost for scatter operation compared to an industrial DMA-based scatter/gather solution.
机译:有效的并行执行科学和交易导向的应用程序需要通过使用Capturet底层访问模式的显式方法改进局部性来降低通信/同步开销。在这项工作中,我们提出了低成本的硬件,支持负载平衡和并行广播/散点宏操作。我们使用多核系统上的芯片系统(SOC)的周期准确的Systemc虚拟平台来评估这些原语,该系统通过超级网络上(NOC)互连周期准确的处理器模型(Cortex-A9)和内存层次结构。结果是在小范围嵌入式多核SoC上执行典型的并行矩阵乘法基准,表明在利用两个基元时,为广播/散射基元的负载平衡和50%的平均执行时间提高为25%。虽然负载平衡仅依赖于远程共享存储器访问原理,但Zedboard的Zynq 7020 FPGA上的合成表示与基于工业DMA的散点/聚耦相比的散射操作的极低面积成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号