首页> 外文期刊>Applied numerical mathematics >A randomized exponential canonical correlation analysis method for data analysis and dimensionality reduction
【24h】

A randomized exponential canonical correlation analysis method for data analysis and dimensionality reduction

机译:用于数据分析和维数减少的随机指数规范相关分析方法

获取原文
获取原文并翻译 | 示例

摘要

Canonical correlation analysis (CCA) is a famous data analysis method that has been successfully used in many areas. CCA extracts meaningful information from a pair of data sets, by seeking pairs of linear combinations from two sets of variables with maximum correlation. Mathematically, CCA resorts to solving a large-scale generalized eigenvalue problem. However, as the dimension of the data sets is much larger than the number of samples, CCA may suffer from the small-sample-size (SSS) problem and the over-fitting problem. In order to overcome these difficulties, the regularized technique is often applied, but it is difficult to choose the optimal parameter in advance. In this work, we propose an Exponential Canonical Correlation Analysis (ECCA) method based on matrix exponential, which is parameter-free and can overcome the over-fitting and the SSS problems fundamentally. However, the computational overhead of the ECCA method is very high in practice. Based on the randomized singular value decomposition (RSVD), we then propose a Randomized Exponential Canonical Correlation Analysis (RECCA) method for data analysis and dimensionality reduction. Theoretical results are given to show the rationality of this randomized method, and establish the relationship between RECCA and ECCA. Numerical experiments are performed on some real-world, high-dimensional and large-sample data sets, which illustrate the superiority of the proposed algorithms over many state-of-the-art CCA algorithms.
机译:规范相关性分析(CCA)是一种着名的数据分析方法,已在许多领域成功使用。 CCA通过从两组变量对具有最大相关性的一组变量的线性组合,从一对数据集中提取有意义的信息。数学上,CCA度假村解决了大规模的广义特征值问题。然而,由于数据集的维度远大于样本的数量,因此CCA可能遭受小样本大小(SSS)问题和过拟合问题。为了克服这些困难,通常应用正则化技术,但是难以提前选择最佳参数。在这项工作中,我们提出了一种基于矩阵指数的指数规范相关分析(ECCA)方法,这是无参数的,并且可以从根本上克服过度拟合和SSS问题。然而,ECCA方法的计算开销在实践中非常高。基于随机奇异值分解(RSVD),我们提出了一种随机指数规范相关分析(RECCA)方法,用于数据分析和维数减少。给出了理论结果,表明了这种随机方法的合理性,并建立了RECCA和ECCA之间的关系。在一些现实世界,高维和大型样本数据集上执行数值实验,其示出了在许多最先进的CCA算法上提出了所提出的算法的优越性。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号