On randomized sketching algorithms and the Tracy–Widom law

Daniel Ahfock; William J. Astle; Sylvia Richardson

摘要

Abstract There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such that key properties of the original dataset are retained. These algorithms are often referred to as sketching algorithms, as the projected dataset can be used as a compressed representation of the full dataset. We show that random matrix theory, in particular the Tracy–Widom law, is useful for describing the operating characteristics of sketching algorithms in the tall-data regime when the sample size n is much greater than the number of variables d. Asymptotic large sample results are of particular interest as this is the regime where sketching is most useful for data compression. In particular, we develop asymptotic approximations for the success rate in generating random subspace embeddings and the convergence probability of iterative sketching algorithms. We test a number of sketching algorithms on real large high-dimensional datasets and find that the asymptotic expressions give accurate predictions of the empirical performance.

机译：摘要越来越多的工作探索将随机投影集成到数值线性代数算法中。主要动机是降低处理大型数据集的总体计算成本。可以使用适当选择的随机投影将原始数据集嵌入到低维空间中，从而保留原始数据集的关键属性。这些算法通常称为草图绘制算法，因为投影数据集可以用作完整数据集的压缩表示。我们表明，随机矩阵理论，特别是Tracy-Widom定律，对于描述样本数量n远大于变量d时，草图算法在高数据范围内的操作特征是有用的。渐近大样本结果特别令人感兴趣，因为这是草图绘制对数据压缩最有用的方法。特别是，我们开发了生成随机子空间嵌入的成功率和迭代草图算法的收敛概率的渐近近似。我们在真实的大型高维数据集上测试了多种草图绘制算法，发现渐近表达式可以准确预测经验性能。

On randomized sketching algorithms and the Tracy–Widom law

摘要

著录项

引文网络

相关主题

期刊订阅