首页> 外文会议>Conference on image and video communications and processing >A new scalable systolic array processor architecture for simultaneous discrete convolution of k different (n x n) filter coefficient planes with a single image plane
【24h】

A new scalable systolic array processor architecture for simultaneous discrete convolution of k different (n x n) filter coefficient planes with a single image plane

机译:一种新的可扩展收缩阵列处理器架构,用于同时离散卷积K不同(n x n)滤波器系数平面与单个图像平面

获取原文

摘要

A new high-performance scalable systolic array processor architecture module is presented which can simultaneously convolute k different (n x n) Filter Coefficient (FC) planes with a single (i x j) pixel Input Image Plane (IP). The architecture will have the capability to simultaneously perform convolution of k different (n x n) FC planes on 600dpi (dot per inch) IPs of size 8(1/2)" x 11" at a rate such that k convoluted Output Image (OI) plane pixels are output each system clock cycle for a system clock cycle time of less than 10 nanoseconds. Bit-parallel arithmetic is used and each IP pixel is 8-bits in length and each FC plane coefficient is 6-bits in length. A new pipelined systolic type architecture module is first developed which can generate one convoluted OI plane pixel per system clock cycle using a level of "r" hardware resources for the case of (n = 5). The architecture is then extended in a scalable and deeper pipelined manner to allow simultaneous convolution of a single IP pixel, with k different (n x n) FC planes for the case of (n = 5), within one system clock cycle, utilizing less than (k x r) hardware resources. Synthesis and post-implementation VHDL simulation results are shown for an experimental model of the architecture which validates the scalability and functionality of the architecture. Simulation results demonstrate the performance of the architecture to be directly proportional to pipeline depth.
机译:提出了一种新的高性能可缩放的收缩系统阵列处理器架构模块,其可以同时卷积k不同(n x n)滤波器系数(Fc)平面,其中单个(i x j)像素输入图像平面(IP)。该架构将具有同时在600dpi(点)尺寸8(1/2)x 11“的速率下同时执行K不同(NXN)FC平面的卷积,以使得k卷积输出图像(OI)将平面像素输出每个系统时钟周期,用于少于10纳秒的系统时钟周期时间。使用比特并行算法,并且每个IP像素长度为8位,并且每个Fc平面系数长度为6位。首先开发出一种新的流水线收缩式架构模块,其可以使用(n = 5)的情况下使用“R”硬件资源的水平来生成每个系统时钟周期的一个复杂的OI平面像素。然后以可伸缩和更深入的流水线方式扩展架构以允许单个IP像素的同时卷积,其中用于(n = 5)的情况,在一个系统时钟周期内,利用少于( KXR)硬件资源。合成和实施后的VHDL仿真结果显示为架构的实验模型,验证了架构的可扩展性和功能。仿真结果展示了架构的性能与管道深度成正比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号