首页> 外文学位 >Randomized algorithms for matrix operations.
【24h】

Randomized algorithms for matrix operations.

机译:矩阵运算的随机算法。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis, we present fast, Monte-Carlo algorithms for performing useful computations on large matrices (e.g. matrix multiplication, Singular Value Decomposition (SVD), etc.). Our algorithms are quite different from traditional numerical analysis approaches and generally fit the following model: we assume that the input matrices are prohibitively large to store in RAM and only external memory storage is possible. We are allowed to read the matrices a few times and keep a “sketch” of the matrices in RAM. We only allow constant processing time after reading each element of the matrices; computing the “sketch” should be a very fast process. Indeed, one of the contributions of our work is to demonstrate that a “sketch” consisting only of a random sample of rows and/or columns of the matrices is adequate for efficient approximations. A crucial issue is how to form the random sample; an obvious choice is “blind” sampling (e.g. uniform sampling). Instead, we use adaptive sampling, namely, in one pass through the data we compute sampling probabilities—we keep rows/columns with probability proportional to the square of their lengths—and, in a second pass, we draw the sample. Adaptive sampling offers substantial gains; it allows us to approximately solve problems in sparse matrices as well as dense matrices. After these two passes, we perform computations only on the “sketch” (rows/columns that we kept).; The main focus of this thesis is a thorough presentation of the algorithms; we also address the issue of their optimality. We expect our algorithms to be useful in a large variety of settings where large data sets are modelled by matrices; indeed, we demonstrate their power by using them as tools to design approximation algorithms for various problems (mostly Information Retrieval and Data Mining applications). We believe that the underlying principle of this thesis—using adaptive sampling to create “sketches” of the data in a small number of passes—constitutes an appealing and intuitive direction to address the size and nature of modern data sets.
机译:在本文中,我们提出了快速的Monte-Carlo算法,用于在大型矩阵上执行有用的计算(例如矩阵乘法,奇异值分解(SVD)等)。我们的算法与传统的数值分析方法完全不同,并且通常适合以下模型:我们假设输入矩阵太大而无法存储在RAM中,并且只能进行外部存储器存储。我们可以读取矩阵几次,并在RAM中保留矩阵的“草图”。在读取矩阵的每个元素后,我们只允许恒定的处理时间;计算“草图”应该是一个非常快速的过程。确实,我们工作的贡献之一就是证明仅由矩阵行和/或列的随机样本组成的“草图”足以进行有效逼近。一个关键问题是如何形成随机样本。一个明显的选择是“盲”抽样(例如统一抽样)。取而代之的是,我们使用自适应采样,即,在数据的一次传递中,我们计算采样概率—我们使行/列的概率与其长度的平方成正比—在第二遍中,我们绘制样本。自适应采样带来了可观的收益;它使我们能够近似解决稀疏矩阵和稠密矩阵中的问题。经过这两次之后,我们仅对“草图”(我们保留的行/列)进行计算。本文的重点是对算法的全面介绍。我们还解决了它们的最优性问题。我们希望我们的算法在各种设置中有用,在这些设置中,大型数据集是通过矩阵建模的;实际上,我们通过将它们用作设计各种问题的近似算法(主要是信息检索和数据挖掘应用程序)的工具来证明它们的功能。我们认为,本文的基本原理(使用自适应采样以少量通过创建数据的“草图”)构成了吸引人的直观方向,以解决现代数据集的大小和性质。

著录项

  • 作者

    Drineas, Petros S.;

  • 作者单位

    Yale University.;

  • 授予单位 Yale University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 118 p.
  • 总页数 118
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号