首页> 外文期刊>Journal of Real-Time Image Processing >Highly efficient image registration for embedded systems using a distributed multicore DSP architecture
【24h】

Highly efficient image registration for embedded systems using a distributed multicore DSP architecture

机译:使用分布式多核DSP架构的嵌入式系统的高效图像配准

获取原文
获取原文并翻译 | 示例
           

摘要

We present a complete approach to highly efficient image registration for embedded systems, covering all steps from theory to practice. An optimization-based image registration algorithm using a least-squares data term is implemented on an embedded distributed multicore digital signal processor (DSP) architecture. All relevant parts are optimized, ranging from mathematics, algorithmics, and data transfer to hardware architecture and electronic components. The optimization for the rigid alignment of two-dimensional images is performed in a multilevel Gauss-Newton minimization framework. We propose a reformulation of the necessary derivative computations, which eliminates all sparse matrix operations and allows for parallel, memory-efficient computation. The pixelwise parallellism forms an ideal starting point for our implementation on a multicore, multichip DSP architecture. The reduction of data transfer to the particular DSP chips is key for an efficient calculation. By determining worst cases for the subimages needed on each DSP, we can substantially reduce data transfer and memory requirements. This is accompanied by a sophisticated padding mechanism that eliminates pipeline hazards and speeds up the generation of the multilevel pyramid. Finally, we present a reference hardware architecture consisting of four TI C6678 DSPs with eight cores each. We show that it is possible to register high-resolution images within milliseconds on an embedded device. In our example, we register two images with 4096 x 4096 pixels within 93 ms, while off-loading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.
机译:我们提供了一种用于嵌入式系统的高效图像配准的完整方法,涵盖了从理论到实践的所有步骤。在嵌入式分布式多核数字信号处理器(DSP)架构上实现了使用最小二乘数据项的基于优化的图像配准算法。从数学,算法和数据传输到硬件体系结构和电子组件,所有相关部分都得到了优化。在多级高斯-牛顿最小化框架中执行二维图像的刚性对齐的优化。我们建议对必要的导数计算进行重新表述,以消除所有稀疏矩阵运算,并允许并行的内存有效计算。像素并行机制为我们在多核,多芯片DSP架构上的实现提供了理想的起点。减少向特定DSP芯片的数据传输是有效计算的关键。通过确定每个DSP所需的子映像的最坏情况,我们可以大大减少数据传输和内存需求。这伴随着复杂的填充机制,消除了管道的危害并加快了多层次金字塔的生成。最后,我们介绍了一种参考硬件架构,该架构由四个TI C6678 DSP(每个具有八个内核)组成。我们表明,可以在几毫秒内在嵌入式设备上注册高分辨率图像。在我们的示例中,我们在93 ms内记录了两个4096 x 4096像素的图像,同时使CPU的负载减少了20倍,所需电能减少了3.12倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号