首页> 外文会议>European Conference on Machine Learning(ECML 2007); 20070917-21; Warsaw(PL) >Principal Component Analysis for Large Scale Problems with Lots of Missing Values
【24h】

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

机译:具有大量缺失值的大规模问题的主成分分析

获取原文
获取原文并翻译 | 示例

摘要

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.
机译:主成分分析(PCA)是一种众所周知的经典数据分析技术。有许多算法可以解决问题,其中一些算法在解决高维问题时比其他算法更好。它们处理数据缺失值的能力也有所不同。我们研究了一种情况,其中数据是高维数据,并且大多数值都缺失。在数据非常稀疏的情况下,即使在简单的线性模型(例如PCA)中,过度拟合也成为一个严重的问题。我们提出了一种基于加速简单主体子空间规则的算法,并将其扩展为使用正则化和变分贝叶斯(VB)学习。使用Netflix数据进行的实验证实,所提出的算法比任何一种比较方法都快得多,并且VB-PCA方法比传统PCA或常规PCA对新数据的预测更为准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号