首页> 外文学位 >Handling missing data in high-dimensional subspace modeling.
【24h】

Handling missing data in high-dimensional subspace modeling.

机译:在高维子空间建模中处理丢失的数据。

获取原文
获取原文并翻译 | 示例

摘要

Low-dimensional linear subspace approximations to high-dimensional data are powerful enough to capture a great deal of structure in many signals, and yet they also offer simplicity and ease of analysis. Because of this, they have provided a powerful tool to many areas of engineering and science: problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing.;Big data are making a big splash, with everyone from bookstores to stock brokers, hospitals to libraries, ecologists to military generals looking to capitalize on data collection opportunities. Big datasets are by definition massive, requiring computationally efficient techniques. Even more consequential is that the data quality is impossible to control. It is truly inevitable that there will be missing data and corrupted measurements. Most classical statistical techniques implicitly assume that these issues have been “cleaned” away before modeling. This fact has encouraged research development on new signal processing techniques to address these issues directly; two such problems of study have been termed “Matrix Completion” and “Robust PCA.” This thesis makes fundamental contributions in these areas and on related issues of subspace modeling.;In this thesis we show some fundamental results on how to estimate a projection residual from an incomplete data vector, and how our theory provides powerful tools for developing algorithms for subspace estimation and tracking with incomplete data. I present the algorithm GROUSE (Grassmannian Rank-One Update Subspace Estimation), a subspace tracking algorithm that performs gradient descent on the Grassmannian (the manifold of all subspaces). I present two algorithms for high rank matrix completion, a problem where columns of a matrix are incomplete and arise from a union of subspaces; the resulting matrix can be full rank. The first algorithm we present, based on incomplete projection residuals, can provably complete columns of the matrix with high probability. The second algorithm, k-GROUSE, is computationally efficient. We discuss a robust subspace tracking algorithm, GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm), which is based on the analogous l1 cost function. We also discuss an approach to the column subset selection problem with missing data.
机译:低维线性子空间到高维数据的逼近足以捕获许多信号中的大量结构,但它们也提供了简单性和易于分析性。因此,它们为工程和科学的许多领域提供了强大的工具:估计,检测和预测问题,以及诸如网络监视,协作过滤,计算机视觉中的对象跟踪和环境传感之类的应用。引人注目的是,从书店到股票经纪人,从医院到图书馆,从生态学家到军事将领,他们都希望利用数据收集机会。大数据集从定义上说是庞大的,需要高效的计算技术。更重要的是,数据质量无法控制。确实不可避免的是会丢失数据并破坏测量。大多数经典的统计技术都隐含地假设在建模之前已经“清除”了这些问题。这一事实鼓励了对新信号处理技术的研究开发,以直接解决这些问题。两个这样的研究问题被称为“矩阵完成”和“鲁棒PCA”。本论文在这些领域以及子空间建模的相关问题上做出了基础性的贡献。估算和跟踪不完整的数据。我提出了算法GROUSE(Grassmannian秩一更新子空间估计),它是一种对Grassmannian(所有子空间的流形)执行梯度下降的子空间跟踪算法。我提出了两种用于完成高秩矩阵完成的算法,即矩阵的列不完整并且由子空间的并集引起的问题。所得矩阵可以是全等级。我们提出的第一个算法基于不完整的投影残差,可以证明概率很高地完成矩阵的列。第二种算法k-GROUSE具有高效的计算能力。我们讨论了基于类似l1成本函数的鲁棒子空间跟踪算法GRASTA(格拉斯曼鲁棒自适应子空间跟踪算法)。我们还将讨论缺少数据的列子集选择问题的方法。

著录项

  • 作者

    Balzano, Laura Kathryn.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号