【24h】

Principal Component Analysis for Distributed Data Sets with Updating

机译:带有更新的分布式数据集的主成分分析

获取原文
获取原文并翻译 | 示例

摘要

Identifying the patterns of large data sets is a key requirement in data mining. A powerful technique for this purpose is the principal component analysis (PCA). PCA-based clustering algorithms are effective when the data sets are found in the same location. In applications where the large data sets are physically far apart, moving huge amounts of data to a single location can become an impractical, or even impossible, task. A way around this problem was proposed in [10], where truncated singular value decompositions (SVDs) are computed locally and used to reduce the communication costs. Unfortunately, truncated SVDs introduce local approximation errors that could add up and would adversely affect the accuracy of the final PCA. In this paper, we introduce a new method to compute the PCA without incurring local approximation errors. In addition, we consider the situation of updating the PCA when new data arrive at the various locations.
机译:识别大型数据集的模式是数据挖掘的关键要求。为此目的一种强大的技术是主成分分析(PCA)。当在同一位置找到数据集时,基于PCA的聚类算法将非常有效。在大型数据集在物理上相距较远的应用程序中,将大量数据移动到单个位置可能会变得不切实际,甚至是不可能的任务。在[10]中提出了解决该问题的方法,其中在本地计算了截断的奇异值分解(SVD),并将其用于降低通信成本。不幸的是,截短的SVD会引入局部逼近误差,这些误差可能加在一起并对最终PCA的准确性产生不利影响。在本文中,我们介绍了一种在不引起局部逼近误差的情况下计算PCA的新方法。另外,我们考虑了当新数据到达各个位置时更新PCA的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号