Visual localisation and object recognition are key goals of artificial intelligence research that have been traditionally investigated separately. Appearance-based methods can be used to treat both problems from a common perspective. Therefore, the main purpose of this thesis is to explore appearance-based methods in the specific contexts of object recognition and visual localisation from wearable and hand-held devices. Specifically, the contributions of this thesis are as follows:ududThe first topic of study was the object recognition of grocery products acquired with hand-held and wearable cameras, a use case of particular relevance for the blind and partially sighted people. The main contributions around this topic are a) the SHORT dataset, comprising 100 categories and more than 135,000 images between its training and query sets; and b) an open-source pipeline and complete evaluation of popular bag-of-visual-words (BoVW) techniques when tested against SHORT. The SHORT dataset is novel as it introduces a clear distinction between high quality training images and query images taken in the wild. This is an anticipated scenario in which retailers would acquire images for their online shopping brochures and users would submit images of unpredictable quality for recognition. The performance results of the methods tested demonstrate the challenging characteristics of SHORT.ududThe second subject of study was indoor localisation from hand-held and wearable cameras. For this topic, the RSM dataset was constructed, containing more than 90,000 video frames along more than 3 km of indoor journeys. An open-source pipeline and evaluation is also contributed in this area. The methods include a selection of custom-created single-frame and spatio-temporal image description methods. These are tested against baseline appearance-based methods such as SIFT and HOG3D and state-of-the-art SLAM. Results show that appearance-based methods, even in the absence of tracking, can provide enough information to infer location with errors as small as 1.5 m over a 50 m journey. From the methods studied, results suggest that single-frame approaches perform slightly better than spatio-temporal ones.ududIn third place, I have developed a novel biologically inspired model of artificial place cells based on kernel distance metrics of appearance-based methods between query and database images. Localisation performance was also tested against the RSM dataset, achieving errors as low as 1.4 m over a 50 m trajectory and comparing favourably with the state of the art SLAM. ududFinally, I have prototyped an assistive localisation system using wearable or hand-held visual input and tactile feedback to track the localisation of the user over haptic maps. An evaluation of the quality of the tactile feedback using this approach is also provided.
展开▼
机译:视觉定位和对象识别是人工智能研究的主要目标,传统上是分别对其进行研究。基于外观的方法可以从共同的角度来处理这两个问题。因此,本论文的主要目的是在可穿戴和手持设备的目标识别和视觉定位的特定环境中探索基于外观的方法。具体来说,本论文的贡献如下: ud ud研究的第一个主题是使用手持和可穿戴式摄像头获得的杂货的对象识别,这是一种对盲人和弱视人群特别有用的用例。围绕该主题的主要贡献是:a)SHORT数据集,包括100个类别以及训练和查询集之间的135,000多个图像; b)一个开放源代码管道,并在针对SHORT进行测试时,对流行的视觉词袋(BoVW)技术进行了完整评估。 SHORT数据集很新颖,因为它在高质量的训练图像和野外拍摄的查询图像之间引入了明显的区别。这是一种预期的情况,其中零售商将为其在线购物手册获取图像,而用户将提交质量无法预测的图像以进行识别。测试方法的性能结果证明了SHORT具有挑战性。 ud ud研究的第二个主题是手持式和可穿戴式摄像机的室内定位。为此,构建了RSM数据集,其中包含3万多公里的室内行程中的90,000多个视频帧。在这一领域,开源管道和评估也有所贡献。这些方法包括自定义创建的单帧和时空图像描述方法的选择。已针对基于基线外观的方法(如SIFT和HOG3D和最新的SLAM)进行了测试。结果表明,即使在没有跟踪的情况下,基于外观的方法也可以提供足够的信息来推断位置,并且在50 m的行程中误差仅为1.5 m。从研究的方法中,结果表明单帧方法的性能要优于时空方法。 ud ud第三,我基于基于外观的方法的内核距离度量,开发了一种新型的生物学启发的人工放置细胞模型在查询图像和数据库图像之间。还针对RSM数据集测试了定位性能,在50 m的轨迹上实现了低至1.4 m的误差,并与现有的SLAM进行了比较。最后,我使用可穿戴或手持式视觉输入和触觉反馈来设计辅助定位系统的原型,以在触觉地图上跟踪用户的定位。还提供了使用这种方法对触觉反馈质量的评估。
展开▼