Principal Component Analysis for Distributed Data Sets with Updating

机译：带有更新的分布式数据集的主成分分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Identifying the patterns of large data sets is a key requirement in data mining. A powerful technique for this purpose is the principal component analysis (PCA). PCA-based clustering algorithms are effective when the data sets are found in the same location. In applications where the large data sets are physically far apart, moving huge amounts of data to a single location can become an impractical, or even impossible, task. A way around this problem was proposed in [10], where truncated singular value decompositions (SVDs) are computed locally and used to reduce the communication costs. Unfortunately, truncated SVDs introduce local approximation errors that could add up and would adversely affect the accuracy of the final PCA. In this paper, we introduce a new method to compute the PCA without incurring local approximation errors. In addition, we consider the situation of updating the PCA when new data arrive at the various locations.

机译：识别大型数据集的模式是数据挖掘的关键要求。为此目的一种强大的技术是主成分分析（PCA）。当在同一位置找到数据集时，基于PCA的聚类算法将非常有效。在大型数据集在物理上相距较远的应用程序中，将大量数据移动到单个位置可能会变得不切实际，甚至是不可能的任务。在[10]中提出了解决该问题的方法，其中在本地计算了截断的奇异值分解（SVD），并将其用于降低通信成本。不幸的是，截短的SVD会引入局部逼近误差，这些误差可能加在一起并对最终PCA的准确性产生不利影响。在本文中，我们介绍了一种在不引起局部逼近误差的情况下计算PCA的新方法。另外，我们考虑了当新数据到达各个位置时更新PCA的情况。

著录项

来源
《International Workshop on Advanced Parallel Processing Technologies(APPT 2005); 20051027-28; Hong Kong(CN) 》|2005年|P.471-483|共13页
会议地点 Hong Kong(CN)
作者
Zheng-Jian Bai; Raymond H. Chan; Franklin T. Luk;
展开▼
作者单位

Department of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类理论、方法 ; 计算机软件 ;
关键词

相似文献

外文文献
中文文献
专利

1. PM₁₀ and gaseous pollutants trends from air quality monitoring networks in Bari province: principal component analysis and absolute principal component scores on a two years and half data set [J] . Pierina Ielpo, Vincenzo Paolillo, Gianluigi de Gennaro, Chemistry central journal . 2014 ,第1期

机译：来自巴里省空气质量监测网络的PM _{10 和气态污染物趋势：两年半数据集的主成分分析和绝对主成分评分}
2. PM 10 and gaseous pollutants trends from air quality monitoring networks in Bari province: principal component analysis and absolute principal component scores on a two years and half data set [J] . Pierina Ielpo, Vincenzo Paolillo, Gianluigi de Gennaro, Chemistry central journal . 2014 ,第1期

机译：来自巴里省空气质量监测网络的PM 10和气态污染物趋势：两年半数据集的主成分分析和绝对主成分评分
3. Combining multiway principal component analysis (MPCA) and clustering for efficient data mining of historical data sets of SBR processes [J] . Villez K, Ruiz M, Sin G, Water Science and Technology . 2008 ,第10期

机译：结合多路主成分分析（MPCA）和聚类，可对SBR过程的历史数据集进行有效的数据挖掘
4. Principal Component Analysis for Distributed Data Sets with Updating [C] . Zheng-Jian Bai, Raymond H. Chan, Franklin T. Luk International Workshop on Advanced Parallel Processing Technologies . 2005

机译：具有更新的分布式数据集的主成分分析
5. Using principal component analysis (PCA) to obtain auxiliary variables for missing data in large data sets. [D] . Howard, Waylon J. 2012

机译：使用主成分分析（PCA）获得大数据集中缺失数据的辅助变量。
6. PM10 and gaseous pollutants trends from air quality monitoring networks in Bari province: principal component analysis and absolute principal component scores on a two years and half data set [O] . Pierina Ielpo, Vincenzo Paolillo, Gianluigi de Gennaro, 2014

机译：来自巴里省空气质量监测网络的PM10和气态污染物趋势：基于两年半数据集的主成分分析和绝对主成分评分
7. Principal Component Analysis for Distributed Data Sets with Updating [O] . Zheng-Jian Bai, Raymond H. Chan, Franklin T. Luk, 2005

机译：带有更新的分布式数据集的主成分分析

Principal Component Analysis for Distributed Data Sets with Updating

摘要

著录项

相似文献

相关主题

期刊订阅