首页> 外文会议>IEEE International Solid- State Circuits Conference >An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration
【24h】

An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration

机译:A 879GOPS 243MW 80FPS VGA全视觉CNN-SLAM处理器,用于广泛自治勘探

获取原文

摘要

Simultaneous localization and mapping (SLAM) estimates an agent's trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computationand memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent's 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent's long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale (~120 variables) non-linear optimization. Visual SLAM requires massive computation (>250GOP/s) in the CNN-based feature extraction and matching, as well as datadependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a ~3GHz CPU+ GPU system with >100MB memory footprint and >100W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2,3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2,4,5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.
机译:同时本地化和映射(SLAM)估计代理的所有六个自由度(6 DOF)的轨迹,并构建一个未知周围的3D地图。它是一个基本内核,使头戴式增强/虚拟现实设备和微型航空车辆的自主导航能够实现。可视来自Visual Slam的明显趋势是应用计算和内存密集型卷积神经网络(CNNS),以胜过传统的基于手工设计的基于功能的方法[1]。对于每个视频帧,CNN提取的功能与存储的关键点匹配,以通过解决透视-n点(PNP)非线性优化问题来估计代理的6-DOF姿势(图7.3.1左侧)。代理在多个帧上的长期轨迹由捆绑调整过程(BA,图7.1右)精制,涉及大规模(〜120变量)非线性优化。 Visual Slam需要在基于CNN的特性提取和匹配中的大量计算(> 250GoP / s),以及具有高精度操作的DatapeneDendent动态存储器访问和控制流程,从而产生了显着的低功耗设计挑战。软件实现是不切实际的,导致具有〜3GHz CPU + GPU系统的0.2s运行时,具有> 100MB内存占用和> 100W功耗。现有的ASIC已经实现了不完整的SLAM系统[2,3],其缺乏对自我运动的估计或使用简化(非CNN)特征提取和跟踪[2,4,5],限制血液质量和范围。最近的ASIC [5]通过片外高精度惯性测量单元(IMU)增强了视觉猛击,简化了计算复杂性,而是产生额外的功率和成本开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号