Precise localization using visual sensors is a fundamental requirement in many applications, including robotics, augmented reality, and autonomous systems. Traditionally, the localization problem has been tackled by leveraging 3D-geometry registering approaches. Recently, end-to-end regressor strategies using deep convolutional neural networks have achieved impressive performance, but they do not achieve the same performance as 3D structure-based methods. To some extent, this problem has been tackled by leveraging the beneficial properties of sequential images or geometric constraints. However, these approaches can only achieve a slight improvement. In this work, we address this problem for indoor scenarios, and we argue that regressing the camera pose using sparse feature descriptors could significantly improve the pose regressor performance compared with deep single-feature-vector representation. We propose a novel approach that can directly consume sparse feature descriptors to regress the camera pose effectively. More importantly, we propose a simplistic data augmentation procedure to exploit the sparse descriptors of unseen poses, leading to a remarkable enhancement in the generalization performance. Lastly, we present an extensive evaluation of our method on publicly available indoor datasets. Our FeatLoc achieves 22 and 40 improvements in translation errors on 7Scenes and 12-Scenes relatively, compared with recent state-of-the-art absolute pose regression-based approaches. Our codes are released at https://github.com/ais-lab/FeatLoc.
展开▼