首页> 外文期刊>Microprocessors and microsystems >Memory streaming acceleration for embedded systems with CPU-accelerator cooperative data processing
【24h】

Memory streaming acceleration for embedded systems with CPU-accelerator cooperative data processing

机译:具有CPU加速器协同数据处理功能的嵌入式系统的内存流加速

获取原文
获取原文并翻译 | 示例

摘要

Memory streaming operations (i.e., memory-to-memory data transfer with or without simple arithmetic/logical operations) are one of the most important tasks in general embedded/mobile computer systems. In this paper, we propose a technique to accelerate memory streaming operations. The conventional way to accelerate memory streaming operations is employing direct memory access (DMA) with dedicated hardware accelerators for simple arithmetic/logical operations. In our technique, we utilize not only a hardware accelerator with DMA but also a central processing unit (CPU) to perform memory streaming operations, which improves the performance and energy efficiency of the system. We also implemented our prototype in a field-programmable gate array system-on-chip (FPGA-SoC) platform and evaluated our technique in real measurement from our prototype. From our experimental results, our technique improves memory streaming performance by 34.1-73.1% while reducing energy consumption by 29.0-45.5%. When we apply our technique to various real-world applications such as image processing, 1 x 1 convolution operations, and bias addition/scale, performances are improved by 1.1 x -2.4 x. In addition, our technique reduces energy consumptions when performing image processing, 1 x 1 convolution, and bias addition/scale by 7.9-17.7%, 46.8-57.7%, and 41.7-58.5%, respectively. (C) 2019 Elsevier B.V. All rights reserved.
机译:存储器流操作(即,具有或不具有简单算术/逻辑运算的存储器到存储器的数据传输)是一般嵌入式/移动计算机系统中最重要的任务之一。在本文中,我们提出了一种加速内存流操作的技术。加速内存流操作的常规方法是使用直接内存访问(DMA)和专用硬件加速器进行简单的算术/逻辑运算。在我们的技术中,我们不仅利用具有DMA的硬件加速器,而且利用中央处理器(CPU)来执行内存流操作,从而提高了系统的性能和能源效率。我们还在现场可编程门阵列片上系统(FPGA-SoC)平台上实现了我们的原型,并根据原型对我们的技术进行了实际测量。从我们的实验结果来看,我们的技术将内存流传输性能提高了34.1-73.1%,同时将能耗降低了29.0-45.5%。当我们将技术应用到各种实际应用中时,例如图像处理,1 x 1卷积运算和偏移加法/缩放,性能将提高1.1 x -2.4 x。此外,我们的技术在执行图像处理,1 x 1卷积和偏置相加/缩放时分别减少了7.9-17.7%,46.8-57.7%和41.7-58.5%的能耗。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号