Principal Component Analysis for Large Scale Problems with Lots of Missing Values

机译：具有大量缺失值的大规模问题的主成分分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.

机译：主成分分析（PCA）是一种众所周知的经典数据分析技术。有许多算法可以解决问题，其中一些算法在解决高维问题时比其他算法更好。它们处理数据缺失值的能力也有所不同。我们研究了一种情况，其中数据是高维数据，并且大多数值都缺失。在数据非常稀疏的情况下，即使在简单的线性模型（例如PCA）中，过度拟合也成为一个严重的问题。我们提出了一种基于加速简单主体子空间规则的算法，并将其扩展为使用正则化和变分贝叶斯（VB）学习。使用Netflix数据进行的实验证实，所提出的算法比任何一种比较方法都快得多，并且VB-PCA方法比传统PCA或常规PCA对新数据的预测更为准确。

著录项

来源
《European Conference on Machine Learning(ECML 2007); 20070917-21; Warsaw(PL)》|2007年|P.691-698|共8页
会议地点 Warsaw(PL)
作者
Tapani Raiko; Alexander Ilin; Juha Karhunen;
展开▼
作者单位

Adaptive Informatics Research Center, Helsinki Univ. of Technology P.O. Box 5400, FI-02015 TKK, Finland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
入库时间 2022-08-26 14:09:17

相似文献

外文文献
中文文献
专利

1. Dynamic principal component analysis with missing values [J] . Kwon Junhyeon, Oh Hee-Seok, Lim Yaeji Journal of applied statistics . 2020,第9a12期

机译：缺失值的动态主成分分析
2. Principal Component Analysis of Process Datasets with Missing Values [J] . Kristen A. Severson, Mark C. Molaro, Richard D. Braatz Processes . 2017,第3期

机译：缺失值的过程数据集的主成分分析
3. Principal component analysis with missing values: a comparative survey of methods [J] . Dray Stephane, Josse Julie Plant Ecology . 2015,第5期

机译：具有缺失值的主成分分析：方法的比较调查
4. Principal Component Analysis for Large Scale Problems with Lots of Missing Values [C] . Tapani Raiko, Alexander Ilin, Juha Karhunen European Conference on Machine Learning . 2007

机译：大规模缺失值的大规模问题的主要成分分析
5. HEALTH EDUCATION AND THE PRINCIPAL: AN ANALYSIS OF PRINCIPALS' HEALTH VALUES, HEALTH BEHAVIORS AND SCHOOL HEALTH INSTRUCTION COMPONENTS IN SELECTED SCHOOLS (WELL-BEING, MIDDLE SCHOOL, JUNIOR HIGH, ADMINISTRATOR). [D] . SMITH, DENNIS WESLEY. 1985

机译：健康教育和主要原则：选定学校（健康，中级，初中，行政管理人员）的主要保健价值，健康行为和学校健康指示成分的分析。
6. Extracting Common Mode Errors of Regional GNSS Position Time Series in the Presence of Missing Data by Variational Bayesian Principal Component Analysis [O] . Wudong Li, Weiping Jiang, Zhao Li, 2020

机译：通过变分贝叶斯主成分分析提取缺失数据下区域GNSS位置时间序列共模误差
7. Principal component analysis for large scale problems with lots of missing values [O] . Tapani Raiko, Er Ilin, Juha Karhunen 2007

机译：具有大量缺失值的大规模问题的主成分分析

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

摘要

著录项

相似文献

相关主题

期刊订阅