首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures
【24h】

MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures

机译:优点:用于并行架构上具有内存效率的视觉处理的Tensor变换

获取原文
获取原文并翻译 | 示例

摘要

Computationally intensive deep neural networks (DNNs) are well- suited to run on GPUs, but newly developed algorithms usually require the heavily optimized DNN routines to work efficiently, and this problem could be even more difficult for specialized DNN architectures. In this article, we propose a mathematical formulation that can be useful for transferring the algorithm optimization knowledge across computing platforms. We discover that data movement and storage inside parallel processor architectures can be viewed as tensor transforms across memory hierarchies, making it possible to describe many memory optimization techniques mathematically. Such transform, which we call memory-efficient ranged inner-product tensor (MERIT) transform, can be applied to not only DNN tasks but also many traditional machine learning and computer vision computations. Moreover, the tensor transforms can be readily mapped to existing vector processor architectures. In this article, we demonstrate that many popular applications can be converted to a succinct MERIT notation on GPUs, speeding up GPU kernels up to 20 times while using only half as many code tokens. We also use the principle of the proposed transform to design a specialized hardware unit called MERIT-z processor. This processor can be applied to a variety of DNN tasks as well as other computer vision tasks while providing comparable area and power efficiency to dedicated DNN application-specific integrated circuits (ASICs).
机译:计算密集型深度神经网络(DNN)非常适合在GPU上运行,但是新开发的算法通常需要经过高度优化的DNN例程才能有效地工作,而对于专用DNN架构,此问题可能更加困难。在本文中,我们提出了一种数学公式,该公式可用于在计算平台之间传递算法优化知识。我们发现并行处理器体系结构中的数据移动和存储可以看作是跨存储器层次结构的张量变换,从而可以用数学方法描述许多存储器优化技术。这种转换,我们称为内存有效的范围内积张量(MERIT)转换,不仅可以应用于DNN任务,而且可以应用于许多传统的机器学习和计算机视觉计算。而且,张量变换可以容易地映射到现有的矢量处理器体系结构。在本文中,我们演示了许多流行的应用程序都可以在GPU上转换为简洁的MERIT表示法,将GPU内核的速度提高了20倍,而只使用了一半的代码令牌。我们还使用所提出的变换原理设计一个称为MERIT-z处理器的专用硬件单元。该处理器可应用于各种DNN任务以及其他计算机视觉任务,同时为专用DNN专用集成电路(ASIC)提供可比的面积和功率效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号