首页> 外文期刊>Algorithmica >Parallel Computation of High-Dimensional Robust Correlation and Covariance Matrices
【24h】

Parallel Computation of High-Dimensional Robust Correlation and Covariance Matrices

机译:高维鲁棒相关性和协方差矩阵的并行计算

获取原文
获取原文并翻译 | 示例

摘要

The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as Quadrant Correlation (QC) and the Maronna method, have been proposed. However, existing algorithms for QC only give acceptable performance when the dimensionality of the matrix is in the hundreds; and the Maronna method is rarely used in practice because of its high computational cost. In this paper we develop parallel algorithms for both QC and the Maronna method. We evaluate these parallel algorithms using a real data set of the gene expression of over 6000 genes, giving rise to a matrix of over 18 million entries. In our experimental evaluation, we explore scalability in dimensionality and in the number of processors, and the trade-offs between accuracy and computational efficiency. We also compare the parallel behaviours of the two methods. From a statistical standpoint, the Maronna method is more robust than QC. From a computational standpoint, while QC requires less computation, interestingly the Maronna method is much more parallelizable than QC. After a thorough experimentation, we conclude that for many data mining applications, both QC and Maronna are viable options. Less robust, but faster, QC is the recommended choice for small parallel platforms. On the other hand, the Maronna method is the recommended choice when a high degree of robustness is required, or when the parallel platform features a large number of processors (e.g., 32).
机译:协方差和相关矩阵的计算对于许多数据挖掘应用程序和过程至关重要。不幸的是,经典协方差和相关矩阵对异常值非常敏感。已经提出了鲁棒的方法,例如象限相关(QC)和Maronna方法。但是,现有的质量控制算法仅在矩阵的维数为数百时才可以提供可接受的性能。 Maronna方法由于计算成本高而在实践中很少使用。在本文中,我们为QC和Maronna方法开发了并行算法。我们使用超过6000个基因的基因表达的真实数据集来评估这些并行算法,从而产生超过1800万个条目的矩阵。在我们的实验评估中,我们探索了维度和处理器数量的可扩展性,以及准确性和计算效率之间的权衡。我们还比较了这两种方法的并行行为。从统计的角度来看,Maronna方法比QC更为健壮。从计算的角度来看,虽然QC需要更少的计算,但有趣的是Maronna方法比QC具有更高的可并行性。经过全面的实验,我们得出结论,对于许多数据挖掘应用程序,QC和Maronna都是可行的选择。对于小型并行平台,建议选择质量不强但速度更快的质量控制。另一方面,当需要高度鲁棒性或并行平台具有大量处理器(例如32个)时,建议使用Maronna方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号