首页> 外文会议>International Conference on Reconfigurable Computing and FPGAs >Statistical performance of the ARM cortex A9 accelerator coherency port in the xilinx zynq SoC for real-time applications
【24h】

Statistical performance of the ARM cortex A9 accelerator coherency port in the xilinx zynq SoC for real-time applications

机译:xilinx zynq SoC中用于实时应用的ARM cortex A9加速器一致性端口的统计性能

获取原文

摘要

Using the Xilinx Zynq SoC, this work extends previous work by analysing and quantifying the effects of various outer (L2) caching behaviors and memory ordering models on memory accesses from a hardware accelerator (HA) implemented in programmable logic (PL) and from one of two ARM Cortex A9 CPUs. Memory accesses to the L2 cache/external memory and onchip memory (OCM) are both considered. The HA is configured to perform either coherent or non-coherent memory accesses through the accelerator coherency port (ACP), using full AXI4 transactions with 256 64-bit word burst sizes. The L1 caches of the CPU are configured with write-backo-write-allocate for all normal memory ordering operations. The effects of a dummy task executing on the CPU is considered in this work. Performance is measured as the turnaround time of memory accesses, for which writes and reads are measured separately. The numerical results are presented as standard deviation, maximum, and mean values for real-time applications. All experiments are executed on the Avent ZedBoard for 4,000 iterations with 64 KB data payload. Memory accesses to either OCM or external memory from either CPU or ACP are shown to have similar performance, but only under specific behaviors of the memory hierarchy. It is also shown memory whose ordering model is configured as device can hold several advantages over normal and strongly-ordered models.
机译:使用Xilinx Zynq SoC,这项工作通过分析和量化各种外部(L2)缓存行为和内存排序模型对来自以可编程逻辑(PL)实施的硬件加速器(HA)以及来自以下一种硬件的访问的影响,扩展了以前的工作两个ARM Cortex A9 CPU。都考虑了对L2高速缓存/外部存储器和片上存储器(OCM)的存储器访问。 HA配置为使用具有256个64位字突发大小的完整AXI4事务,通过加速器一致性端口(ACP)执行一致性或非一致性存储器访问。 CPU的L1高速缓存配置有回写/无写分配功能,可用于所有正常的内存排序操作。在这项工作中考虑了在CPU上执行的虚拟任务的效果。性能是用内存访问的周转时间来衡量的,为此分别对写和读进行衡量。数值结果表示为实时应用的标准偏差,最大值和平均值。所有实验都在Avent ZedBoard上进行了4,000次迭代,并具有64 KB数据有效负载。从CPU或ACP对OCM或外部内存的内存访问显示出具有相似的性能,但仅在内存层次结构的特定行为下才可以。还显示了其排序模型配置为设备的内存,与普通和强排序模型相比,可以拥有多个优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号