【2h】

CUR matrix decompositions for improved data analysis

机译:CUR矩阵分解可改善数据分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high “statistical leverage” and, thus, in a very precise statistical sense, exert a disproportionately large “influence” on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.
机译:主成分分析以及更普遍的奇异值分解是基础数据分析工具,它们根据重要性降低的正交或不相关向量序列来表示数据矩阵。不幸的是,由于这些向量是最多所有数据点的线性组合,因此就数据和生成数据的过程而言,众所周知这些向量很难解释。在本文中,我们开发了CUR矩阵分解以改进数据分析。 CUR分解是低秩矩阵分解,明确表示为数据矩阵的少量实际列和/或实际行。由于CUR分解是由实际的数据元素构成的,因此可以从中提取数据的领域的从业人员解释CUR分解(以原始数据为限)。我们提出了一种算法,该算法优先选择表现出高“统计杠杆作用”的列和行,因此,在非常精确的统计意义上,会对数据矩阵的最佳低秩拟合施加不成比例的大“影响力”。通过以这种方式选择列和行,我们在最坏情况的分析中获得了改进的相对误差和恒定因子近似保证,而以前的工作则没有那么粗略的加法误差保证。此外,由于构造涉及使用自然且经过广泛研究的统计解释来计算量,因此我们可以利用诊断回归分析中的思想将这些矩阵分解用于探索性数据分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号