首页> 外文会议>Image and Video Communications and Processing 2003 >A new scalable systolic array processor architecture for simultaneous discrete convolution of k different (n x n) filter coefficient planes with a single image plane
【24h】

A new scalable systolic array processor architecture for simultaneous discrete convolution of k different (n x n) filter coefficient planes with a single image plane

机译:一种新的可伸缩脉动阵列处理器架构,用于将k个不同(n x n)个滤波器系数平面与单个图像平面同时离散卷积

获取原文

摘要

A new high-performance scalable systolic array processor architecture module is presented which can simultaneously convolute k different (n x n) Filter Coefficient (FC) planes with a single (i x j) pixel Input Image Plane (IP). The architecture will have the capability to simultaneously perform convolution of k different (n x n) FC planes on 600dpi (dot per inch) IPs of size 8(1/2)" x 11" at a rate such that k convoluted Output Image (OI) plane pixels are output each system clock cycle for a system clock cycle time of less than 10 nanoseconds. Bit-parallel arithmetic is used and each IP pixel is 8-bits in length and each FC plane coefficient is 6-bits in length. A new pipelined systolic type architecture module is first developed which can generate one convoluted OI plane pixel per system clock cycle using a level of "r" hardware resources for the case of (n = 5). The architecture is then extended in a scalable and deeper pipelined manner to allow simultaneous convolution of a single IP pixel, with k different (n x n) FC planes for the case of (n = 5), within one system clock cycle, utilizing less than (k x r) hardware resources. Synthesis and post-implementation VHDL simulation results are shown for an experimental model of the architecture which validates the scalability and functionality of the architecture. Simulation results demonstrate the performance of the architecture to be directly proportional to pipeline depth.
机译:提出了一种新的高性能可伸缩脉动阵列处理器架构模块,该模块可以同时用单个(i x j)像素输入图像平面(IP)卷积k个(n x n)滤波器系数(FC)平面。该体系结构将能够以600kpi(每英寸点数)大小为8(1/2)“ x 11”的IP同时执行k个不同(nxn)FC平面的卷积,其速率应为k卷积的输出图像(OI)每个系统时钟周期输出平面像素的时间小于10纳秒。使用位并行算法,每个IP像素的长度为8位,每个FC平面系数的长度为6位。首先开发了一种新的流水式脉动式体系结构模块,对于(n = 5)情况,使用“ r”级硬件资源,可以在每个系统时钟周期内生成一个卷积的OI平面像素。然后以可扩展和更深层次的流水线方式扩展该体系结构,以允许在一个系统时钟周期内(n = 5)在k个不同(nxn)个FC平面的同时对单个IP像素进行卷积,占用少于( kxr)硬件资源。针对该架构的实验模型显示了综合和实施后的VHDL仿真结果,该结果验证了该架构的可伸缩性和功能性。仿真结果表明,该体系结构的性能与管线深度成正比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号