首页> 外文期刊>Journal of classification >The Analysis of Multivariate Data Using Semi-Definite Programming
【24h】

The Analysis of Multivariate Data Using Semi-Definite Programming

机译:使用半定规划的多元数据分析

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

A model is presented for analyzing general multivariate data. The model puts as its prime objective the dimensionality reduction of the multivariate problem. The only requirement of the model is that the input data to the statistical analysis be a covariance matrix, a correlation matrix, or more generally a positive semi-definite matrix. The model is parameterized by a scale parameter and a shape parameter both of which take on non-negative values smaller than unity. We first prove a wellknown heuristic for minimizing rank and establish the conditions under which rank can be replaced with trace. This result allows us to solve our rank minimization problem as a Semi-Definite Programming (SDP) problem by a number of available solvers. We then apply the model to four case studies dealing with four well-known problems in multivariate analysis. The first problem is to determine the number of underlying factors in factor analysis (FA) or the number of retained components in principal component analysis (PCA). It is shown that our model determines the number of factors or components more efficiently than the commonly used methods. The second example deals with a problem that has received much attention in recent years due to its wide applications, and it concerns sparse principal components and variable selection in PCA. When applied to a data set known in the literature as the pitprop data, we see that our approach yields PCs with larger variances than PCs derived from other approaches. The third problem concerns sensitivity analysis of the multivariate models, a topic not widely researched in the sequel due to its difficulty. Finally, we apply the model to a difficult problem in PCA known as lack of scale invariance in the solutions of PCA. This is the problem that the solutions derived from analyzing the covariance matrix in PCA are generally different (and not linearly related to) the solutions derived from analyzing the correlation matrix. Using our model, we obtain the same solution whether we analyze the correlation matrix or the covariance matrix since the analysis utilizes only the signs of the correlations/covariances but not their values. This is where we introduce a new type of PCA, called Sign PCA, which we speculate on its applications in social sciences and other fields of science.
机译:提出了用于分析通用多元数据的模型。该模型将多元问题的降维作为其主要目标。该模型的唯一要求是,统计分析的输入数据应为协方差矩阵,相关矩阵,或更一般而言为正半定矩阵。通过比例参数和形状参数对模型进行参数化,比例参数和形状参数均采用小于1的非负值。我们首先证明了一种将等级最小化的启发式方法,并确定了可以用迹线代替等级的条件。此结果使我们能够通过许多可用的求解器将秩最小化问题作为半定规划(SDP)问题来解决。然后,我们将该模型应用于处理多变量分析中四个众所周知的问题的四个案例研究。第一个问题是确定因素分析(FA)中潜在因素的数量或主成分分析(PCA)中保留的成分的数量。结果表明,我们的模型比常用方法更有效地确定因素或组件的数量。第二个示例解决了由于其广泛的应用而在近几年引起人们的广泛关注的问题,它涉及PCA中稀疏的主成分和变量选择。当应用于文献中称为pitprop数据的数据集时,我们看到我们的方法所产生的PC的方差要大于从其他方法得出的PC。第三个问题涉及多变量模型的敏感性分析,由于其难度,该问题在续集中未被广泛研究。最后,我们将模型应用于PCA中的一个难题,即PCA解决方案中缺乏规模不变性。这是一个问题,即通过分析PCA中的协方差矩阵得出的解通常与通过分析相关矩阵得出的解不同(而不是线性相关)。使用我们的模型,无论我们分析相关矩阵还是协方差矩阵,我们都获得相同的解决方案,因为分析仅利用相关/协方差的符号,而不使用它们的值。在这里,我们介绍一种新型PCA,称为Sign PCA,我们推测其在社会科学和其他科学领域中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号