Parallel Data Analysis and its Implementation on PC Cluster

机译：并行数据分析及其在PC机群上的实现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are numerous datasets downloaded or extracted easily from the Internet with/without purpose. The number of their records are so huge that we have to seek how to classify them when we want to extract useful information from them. Many computer-intensive data analysis methods have been developed. Some of them, however, are not focused on execution for huge datasets. In short, they tend to waste much time.rnWe have studied making typical data analysis methods in parallel and sought for parallel oriented data analysis, using PC cluster. It gives us enough computing power, but to utilize it, we have to solve the problem, "How can we adapt well-known statistical methods into PC cluster?"rnIn this study, we focus on one of the well-known non-hierarchical clustering method k-means and report how to implement it into parallel environment properly. We have already reported κ-means parallel execution in PVM environment, however, another PC cluster library MPI is more popular and suitable for κ-means. Through numerical examples, we show its effectiveness and offer some viewpoints for parallel oriented statistical analysis.

机译：有/没有目的，很容易从Internet上下载或提取大量数据集。它们的记录数量如此之多，以至于当我们想从它们中提取有用的信息时，我们必须寻求如何对其进行分类。已经开发了许多计算机密集型数据分析方法。但是，其中一些并不专注于大型数据集的执行。简而言之，它们往往会浪费大量时间。我们已经研究了并行制作典型数据分析方法，并寻求使用PC集群进行面向并行数据分析。它为我们提供了足够的计算能力，但是要利用它，我们必须解决以下问题：“如何将知名的统计方法应用于PC集群？”在此研究中，我们着重研究一种著名的非分层方法聚类方法k均值，并报告如何在并行环境中正确实现它。我们已经报道了PVM环境中的κ-means并行执行，但是，另一个PC群集库MPI更受欢迎并且适合κ-means。通过数值例子，我们证明了其有效性，并为平行定向统计分析提供了一些观点。

著录项

来源
《8th World Multi-Conference on Systemics, Cybernetics and Informatics(SCI 2004) vol.5: Computer Science and Engineering》|2004年|75-79|共5页
会议地点 OrlandoFL(US)
作者
Hiroyuki MINAMI; Masahiro MIZUTA;
展开▼
作者单位

Information Initiative Center, Hokkaido University N11W5, Kita-ku, Sapporo 060-0811 JAPAN;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
computational statistics; huge data mining; parallel virtual machine;

机译：计算统计；大量数据挖掘；并行虚拟机;

相似文献

外文文献
中文文献
专利

1. Implementation of Parallel Data Mining on an ATM-Connected PC Cluster an d Performance of TCP Retransmission [J] . Masato Oguchi, Takayuki Tamura, Takahiko Shintani Electronics & Communications in Japan. 1 . 1999,第7期

机译：在ATM连接的PC群集上并行数据挖掘的实现和TCP重传的性能
2. Dynamic data declustering on SAN-connected PC cluster for parallel data mining [J] . Masato Oguchi, Masaru Kitsuregawa 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2001,第191期

机译：在连接SAN的PC群集上进行动态数据分簇以进行并行数据挖掘
3. Dynamic data declustering on SAN-connected PC cluster for parallel data mining [J] . Masato Oguchi, Masaru Kitsuregawa 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2001,第191期

机译：SAN连接的PC集群对并行数据挖掘的动态数据崩溃
4. Parallel Data Analysis and its Implementation on PC Cluster [C] . Hiroyuki MINAMI, Masahiro MIZUTA 8th World Multi-Conference on Systemics, Cybernetics and Informatics(SCI 2004) vol.5: Computer Science and Engineering . 2004

机译：并行数据分析及其在PC机群上的实现
5. The database implementation and algorithm design of qPCR-DAMS: A database tool to analyze, manage, and store quantitative real-time PCR data [D] . He, Keyu 2007

机译：qPCR-DAMS的数据库实现和算法设计：一种用于分析，管理和存储实时定量PCR数据的数据库工具
6. Novel Hybrid GPU–CPU Implementation of Parallelized Monte Carlo Parametric Expectation Maximization Estimation Method for Population Pharmacokinetic Data Analysis [O] . C. M. Ng 2013

机译：人口药代动力学数据分析的并行蒙特卡洛参数期望最大化估计的新型混合GPU-CPU实现
7. Benchmarking of three parallelized implementations of LS-Dyna on a HPC server cluster [O] . Quinton Bruce, Kearsey Anthony 2011

机译：在HpC服务器集群上对Ls-Dyna的三个并行实现进行基准测试

Parallel Data Analysis and its Implementation on PC Cluster

摘要

著录项

相似文献

相关主题

期刊订阅