首页> 外文会议>ACM/IEEE conference on Supercomputing >Parallel database processing on a 100 Node PC cluster
【24h】

Parallel database processing on a 100 Node PC cluster

机译:100节点PC群集上的并行数据库处理

获取原文

摘要

We developed a PC cluster system consists of 100 PCs. Each PC employs the 200MHz Pentium Pro CPU and is connected with others through an ATM switch. We picked up two kinds of data intensive applications. One is decision support query processing. And the other is data mining, specifically, association rule mining.As a high speed network, ATM technology has recently come to be a de facto standard. While other high performance network standards are also available, ATM networks are widely used from local area to widely distributed environments. One of the problems of the ATM networks is its high latencies, in contrast to their higher bandwidths. This is usually considered a serious flaw of ATM in composing high performance massively parallel processors. However, applications such as large scale database analyses are insensitive to the communication latency, requiring only the bandwidth.On the other hand, the performance of personal computers is increasing rapidly these days while the prices of PCs continue to fall at a much faster rate than workstations'. The 200MHz Pentium Pro CPU is competitive in integer performance to the processor chips found in workstations. Although it is still weak at floating point operations, they are not frequently used in database applications.Thus, by combining PCs and ATM switches we can construct a large scale parallel platform very easily and very inexpensively. In this paper, we examine how such a system can help the data warehouse processing, which currently runs on expensive high-end mainframes and/or workstation servers.In our first experiment, we used the most complex query of the standard benchmark, TPC-D, on a 100 GB database to evaluate the system compared with commercial parallel systems. Our PC cluster exhibited much higher performance compared with those in current TPC benchmark reports. Second, we parallelized association rule mining and ran large scale data mining on the PC cluster. Sufficiently high linearity was obtained. Thus we believe that such commoditybased PC clusters will play a very important role in large scale database processing.
机译:我们开发了由100台PC组成的PC集群系统。每台PC均使用200MHz奔腾Pro CPU,并通过ATM交换机与其他​​PC连接。我们选择了两种数据密集型应用程序。一种是决策支持查询处理。另一个是数据挖掘,特别是关联规则挖掘。作为一种高速网络,ATM技术最近已成为事实上的标准。虽然还可以使用其他高性能网络标准,但ATM网络已从本地到广泛分布的环境被广泛使用。与更高的带宽相比,ATM网络的问题之一是其高延迟。通常认为这是ATM在构成高性能大规模并行处理器时的一个严重缺陷。但是,诸如大型数据库分析之类的应用程序对通信延迟不敏感,仅需要带宽。另一方面,这些天个人计算机的性能正在迅速提高,而PC的价格继续以比下降的速度更快的速度下降。工作站。 200MHz奔腾Pro CPU在整数性能方面与工作站中的处理器芯片相比具有竞争力。尽管它在浮点运算方面仍然很弱,但是它们在数据库应用程序中并不常用。因此,通过组合PC和ATM交换机,我们可以非常容易且非常便宜地构建大型并行平台。在本文中,我们研究了这种系统如何帮助数据仓库处理(目前在昂贵的高端大型机和/或工作站服务器上运行)。在我们的第一个实验中,我们使用了标准基准测试中最复杂的查询TPC- D,在100 GB的数据库上评估该系统与商用并行系统的比较。与当前的TPC基准报告相比,我们的PC集群具有更高的性能。其次,我们并行化关联规则挖掘,并在PC集群上进行了大规模数据挖掘。获得足够高的线性。因此,我们认为,这种基于商品的PC集群将在大规模数据库处理中扮演非常重要的角色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号