Parallel database processing on a 100 Node PC cluster

机译：100节点PC群集上的并行数据库处理

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We developed a PC cluster system consists of 100 PCs. Each PC employs the 200MHz Pentium Pro CPU and is connected with others through an ATM switch. We picked up two kinds of data intensive applications. One is decision support query processing. And the other is data mining, specifically, association rule mining.As a high speed network, ATM technology has recently come to be a de facto standard. While other high performance network standards are also available, ATM networks are widely used from local area to widely distributed environments. One of the problems of the ATM networks is its high latencies, in contrast to their higher bandwidths. This is usually considered a serious flaw of ATM in composing high performance massively parallel processors. However, applications such as large scale database analyses are insensitive to the communication latency, requiring only the bandwidth.On the other hand, the performance of personal computers is increasing rapidly these days while the prices of PCs continue to fall at a much faster rate than workstations'. The 200MHz Pentium Pro CPU is competitive in integer performance to the processor chips found in workstations. Although it is still weak at floating point operations, they are not frequently used in database applications.Thus, by combining PCs and ATM switches we can construct a large scale parallel platform very easily and very inexpensively. In this paper, we examine how such a system can help the data warehouse processing, which currently runs on expensive high-end mainframes and/or workstation servers.In our first experiment, we used the most complex query of the standard benchmark, TPC-D, on a 100 GB database to evaluate the system compared with commercial parallel systems. Our PC cluster exhibited much higher performance compared with those in current TPC benchmark reports. Second, we parallelized association rule mining and ran large scale data mining on the PC cluster. Sufficiently high linearity was obtained. Thus we believe that such commoditybased PC clusters will play a very important role in large scale database processing.

机译：我们开发了由100台PC组成的PC集群系统。每台PC均使用200MHz奔腾Pro CPU，并通过ATM交换机与其他PC连接。我们选择了两种数据密集型应用程序。一种是决策支持查询处理。另一个是数据挖掘，特别是关联规则挖掘。作为一种高速网络，ATM技术最近已成为事实上的标准。虽然还可以使用其他高性能网络标准，但ATM网络已从本地到广泛分布的环境被广泛使用。与更高的带宽相比，ATM网络的问题之一是其高延迟。通常认为这是ATM在构成高性能大规模并行处理器时的一个严重缺陷。但是，诸如大型数据库分析之类的应用程序对通信延迟不敏感，仅需要带宽。另一方面，这些天个人计算机的性能正在迅速提高，而PC的价格继续以比下降的速度更快的速度下降。工作站。 200MHz奔腾Pro CPU在整数性能方面与工作站中的处理器芯片相比具有竞争力。尽管它在浮点运算方面仍然很弱，但是它们在数据库应用程序中并不常用。因此，通过组合PC和ATM交换机，我们可以非常容易且非常便宜地构建大型并行平台。在本文中，我们研究了这种系统如何帮助数据仓库处理（目前在昂贵的高端大型机和/或工作站服务器上运行）。在我们的第一个实验中，我们使用了标准基准测试中最复杂的查询TPC- D，在100 GB的数据库上评估该系统与商用并行系统的比较。与当前的TPC基准报告相比，我们的PC集群具有更高的性能。其次，我们并行化关联规则挖掘，并在PC集群上进行了大规模数据挖掘。获得足够高的线性。因此，我们认为，这种基于商品的PC集群将在大规模数据库处理中扮演非常重要的角色。

著录项

来源
《ACM/IEEE conference on Supercomputing》|1997年|P.1-16|共16页
会议地点
作者
Takayuki Tamura; Masato Oguchi; Masaru Kitsuregawa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster [J] . Takayuki TAMURA, Masato OGUCHI, Masaru KITSUREGAWA IEICE Transactions on Information and Systems . 1999,第1期

机译：在100节点ATM连接的PC群集上的高性能并行查询处理
2. Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing [J] . Hiroyuki Takizawa, Hiroaki Kobayashi Journal of supercomputing . 2006,第3期

机译：具有GPU协同处理功能的PC集群上的大规模数据集群的分层并行处理
3. Implementation and Evaluation of Improvement in Parallel Processing Performance on the Cluster Using Small-Scale SMP PCs [J] . TAKAFUMIFUKUNAGA, rnHIDENORI UMENO Electronics and Communications in Japan / 電子情報通信 . 2010,第10期

机译：使用小型SMP PC的集群并行处理性能的实现和评估
4. Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining [C] . Tamura T., Oguchi M., Kitsuregawa M. Supercomputing, ACM/IEEE 1997 Conference . -1

机译：100节点PC群集上的并行数据库处理：决策支持查询处理和数据挖掘的案例
5. Parallel query processing on a cluster-based database system. [D] . Imasaki, Kenji. 2004

机译：基于集群的数据库系统上的并行查询处理。
6. Parallel Processing Method for Airborne Laser Scanning Data Using a PC Cluster and a Virtual Grid [O] . Soo Hee Han, Joon Heo, Hong Gyoo Sohn, 2009

机译：使用PC机群和虚拟网格的机载激光扫描数据并行处理方法
7. Parallelization of the nanoscale device simulator nanoMOS2.0 using a 100 nodes Linux cluster [O] . Sébastien Goasguen, Ali R. Butt, Kevin D. Colby, 2002

机译：使用100个节点的Linux集群并行化纳米级设备模拟器nanoMOS2.0

Parallel database processing on a 100 Node PC cluster

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅