首页> 外文学位 >The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning
【24h】

The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning

机译:黎曼流形在计算机视觉中的作用:从编码到深度度量学习

获取原文
获取原文并翻译 | 示例

摘要

A diverse number of tasks in computer vision and machine learning enjoy from representations of data that are compact yet discriminative, informative and robust to critical measurements. Two notable representations are offered by Region Covariance Descriptors (RCovD) and linear subspaces which are naturally analyzed through the manifold of Symmetric Positive Definite (SPD) matrices and the Grassmann manifold, respectively, two widely used types of Riemannian manifolds in computer vision.;As our first objective, we examine image and video-based recognition applications where the local descriptors have the aforementioned Riemannian structures, namely the SPD or linear subspace structure. Initially, we provide a solution to compute Riemannian version of the conventional Vector of Locally aggregated Descriptors (VLAD), using geodesic distance of the underlying manifold as the nearness measure. Next, by having a closer look at the resulting codes, we formulate a new concept which we name Local Difference Vectors (LDV). LDVs enable us to elegantly expand our Riemannian coding techniques to any arbitrary metric as well as provide intrinsic solutions to Riemannian sparse coding and its variants when local structured descriptors are considered.;We then turn our attention to two special types of covariance descriptors namely infinite-dimensional RCovDs and rank-deficient covariance matrices for which the underlying Riemannian structure, i.e. the manifold of SPD matrices is out of reach to great extent. %Generally speaking, infinite-dimensional RCovDs offer better discriminatory power over their low-dimensional counterparts. To overcome this difficulty, we propose to approximate the infinite-dimensional RCovDs by making use of two feature mappings, namely random Fourier features and the Nystrom method. As for the rank-deficient covariance matrices, unlike most existing approaches that employ inference tools by predefined regularizers, we derive positive definite kernels that can be decomposed into the kernels on the cone of SPD matrices and kernels on the Grassmann manifolds and show their effectiveness for image set classification task.;Furthermore, inspired by attractive properties of Riemannian optimization techniques, we extend the recently introduced Keep It Simple and Straightforward MEtric learning (KISSME) method to the scenarios where input data is non-linearly distributed. To this end, we make use of the infinite dimensional covariance matrices and propose techniques towards projecting on the positive cone in a Reproducing Kernel Hilbert Space (RKHS). We also address the sensitivity issue of the KISSME to the input dimensionality. The KISSME algorithm is greatly dependent on Principal Component Analysis (PCA) as a preprocessing step which can lead to difficulties, especially when the dimensionality is not meticulously set. To address this issue, based on the KISSME algorithm, we develop a Riemannian framework to jointly learn a mapping performing dimensionality reduction and a metric in the induced space. Lastly, in line with the recent trend in metric learning, we devise end-to-end learning of a generic deep network for metric learning using our derivation.
机译:计算机数据和机器学习中的各种任务都来自数据的表示,这些数据既紧凑又具有判别力,信息量大且对关键测量具有鲁棒性。区域协方差描述符(RCovD)和线性子空间提供了两种值得注意的表示形式,分别通过对称正定(SPD)矩阵的流形和Grassmann流形自然分析了这两种类型的计算机视觉中的黎曼流形。我们的第一个目标是研究基于图像和视频的识别应用,其中局部描述符具有上述的黎曼结构,即SPD或线性子空间结构。最初,我们提供了一种解决方案,可以使用基础流形的测地距离作为接近度度量来计算常规局部聚集描述符向量(VLAD)的黎曼版本。接下来,通过仔细查看生成的代码,我们制定了一个新概念,我们将其称为局部差异向量(LDV)。 LDV使我们能够将Riemannian编码技术优雅地扩展到任意度量标准,并在考虑局部结构化描述符时为Riemannian稀疏编码及其变体提供内在解决方案。然后,我们将注意力转向两种特殊类型的协方差描述符,即无限-维RCovD和秩不足协方差矩阵,其潜在的黎曼结构(即SPD矩阵的流形)在很大程度上无法实现。一般而言,无限维RCovD较其低维RCovD具有更好的区分能力。为了克服这个困难,我们建议通过使用两个特征映射(即随机傅里叶特征和Nystrom方法)来近似无限维RCovD。至于秩不足协方差矩阵,与大多数现有的通过预定义正则化器使用推理工具的方法不同,我们导出了正定核,这些正定核可以分解为SPD矩阵圆锥上的核和Grassmann流形上的核,并证明它们的有效性。图像集分类任务。此外,受黎曼优化技术吸引人的特性的启发,我们将最近引入的“保持简单简单”度量学习(KISSME)方法扩展到输入数据呈非线性分布的情况。为此,我们利用了无限维协方差矩阵,并提出了在可再生内核希尔伯特空间(RKHS)中正圆锥上投影的技术。我们还解决了KISSME对输入维数的敏感性问题。 KISSME算法在很大程度上依赖于主成分分析(PCA)作为预处理步骤,这可能会导致困难,尤其是在未精心设置维数的情况下。为了解决这个问题,基于KISSME算法,我们开发了一个黎曼框架,以共同学习在降维空间中执行降维和度量的映射。最后,根据度量学习的最新趋势,我们使用派生方法设计了用于度量学习的通用深度网络的端到端学习。

著录项

  • 作者

    Faraki, Masoud.;

  • 作者单位

    The Australian National University (Australia).;

  • 授予单位 The Australian National University (Australia).;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生理学;
  • 关键词

  • 入库时间 2022-08-17 11:37:37

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号