【24h】

Parallel Data Analysis and its Implementation on PC Cluster

机译:并行数据分析及其在PC机群上的实现

获取原文
获取原文并翻译 | 示例

摘要

There are numerous datasets downloaded or extracted easily from the Internet with/without purpose. The number of their records are so huge that we have to seek how to classify them when we want to extract useful information from them. Many computer-intensive data analysis methods have been developed. Some of them, however, are not focused on execution for huge datasets. In short, they tend to waste much time.rnWe have studied making typical data analysis methods in parallel and sought for parallel oriented data analysis, using PC cluster. It gives us enough computing power, but to utilize it, we have to solve the problem, "How can we adapt well-known statistical methods into PC cluster?"rnIn this study, we focus on one of the well-known non-hierarchical clustering method k-means and report how to implement it into parallel environment properly. We have already reported κ-means parallel execution in PVM environment, however, another PC cluster library MPI is more popular and suitable for κ-means. Through numerical examples, we show its effectiveness and offer some viewpoints for parallel oriented statistical analysis.
机译:有/没有目的,很容易从Internet上下载或提取大量数据集。它们的记录数量如此之多,以至于当我们想从它们中提取有用的信息时,我们必须寻求如何对其进行分类。已经开发了许多计算机密集型数据分析方法。但是,其中一些并不专注于大型数据集的执行。简而言之,它们往往会浪费大量时间。我们已经研究了并行制作典型数据分析方法,并寻求使用PC集群进行面向并行数据分析。它为我们提供了足够的计算能力,但是要利用它,我们必须解决以下问题:“如何将知名的统计方法应用于PC集群?”在此研究中,我们着重研究一种著名的非分层方法聚类方法k均值,并报告如何在并行环境中正确实现它。我们已经报道了PVM环境中的κ-means并行执行,但是,另一个PC群集库MPI更受欢迎并且适合κ-means。通过数值例子,我们证明了其有效性,并为平行定向统计分析提供了一些观点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号