Cluster analysis for the cloud: Parallel Competitive Fitness and parallel K-means#x002B;#x002B; for large dataset analysis

机译：云的聚类分析：并行竞争适应度和并行K-means ++用于大型数据集分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The amount of resources needed to provision Virtual Machines (VM) in a cloud computing systems to support virtual HPC clusters can be predicted from the analysis of historic use data. In previous work, Hacker et al. found that cluster analysis is a useful tool to understand the underlying spatio-temporal dependencies present in system fault and use logs. However, the cluster analysis used for reducing spatio-temporal dependences should be fast and accurate to understand the underlying stochastic properties of these systems. K-means is a fast cluster analysis method, in which accuracy depends on the use of initialization algorithms that are usually serial and slow. In this paper we present two new parallel strategies for fast seeding K-means cluster analysis. Both strategies were tested on a real problem where the aim was to reduce spatial and temporal dependencies of failures on large supercomputer systems. The performance of both strategies were compared with five existing serial implementations: K-means implementations of 1) Lloyd (L); 2) McQueen (M); and 3) Hartigan — Wong (HW), all of them using Forgy seeding; 4) K-means++; and 5) Neural Gas clustering (NG), a more recent and sophisticated method. Our results show that our new Parallel Competitive Fitness approach reduces the Within Sum of Squares (WSQQ) measure, thus increasing cluster quality of the three K-means implementations: L; M; HW, and is 200 times faster than the existing serial K-means++. The existing serial and our new Parallel K-means++ have the lowest WSQQ. Our new Parallel K-means++ is twice as fast as the existing serial K-means++ method, and is 4 times faster than the NG method. Moreover, our new methods did not generate empty clusters, while NG did. As a result of our new techniques, predicting the amount of resources needed to provision VMs processing historic system fault and use data can now be done fas- er and with more accuracy.

机译：可以通过对历史使用数据的分析来预测在云计算系统中配置虚拟机（VM）以支持虚拟HPC集群所需的资源量。在先前的工作中，Hacker等人。发现集群分析是了解系统故障中存在的潜在时空依赖性并使用日志的有用工具。但是，用于减少时空依赖性的聚类分析应该快速而准确地理解这些系统的潜在随机特性。 K-means是一种快速的聚类分析方法，其准确性取决于通常串行且缓慢的初始化算法的使用。在本文中，我们提出了两种新的并行策略，用于快速播种K均值聚类分析。两种策略均在一个实际问题上进行了测试，目的是减少大型超级计算机系统上故障的时空依赖性。将这两种策略的效果与五个现有的串行实施方案进行了比较：K-均值实施方案1）劳埃德（L）； 2）麦昆（M）; 3）Hartigan-Wong（HW），所有人都使用Forgy播种； 4）K-均值++; 5）神经气体聚类（NG），这是一种更新的，更先进的方法。我们的结果表明，我们的新的并行竞争适应性方法减少了平方和（WSQQ）量度，从而提高了三种K均值实施的聚类质量： M;硬件，并且比现有的串行K-means ++快200倍。现有序列号和我们的新Parallel K-means ++具有最低的WSQQ。我们的新Parallel K-means ++速度是现有串行K-means ++方法的两倍，并且比NG方法快4倍。而且，我们的新方法不会生成空簇，而NG会生成空簇。由于采用了我们的新技术，现在可以更轻松，更准确地预测提供虚拟机以处理历史系统故障并使用数据所需的资源量。

著录项

来源
《2012 IEEE 4th International Conference on Cloud Computing Technology and Science.》|2012年|p.177-184|共8页
会议地点 Taipei(CT);Taipei(CT)
作者
Esteves Rui Maximo; Hacker Thomas; Rang Chunming;
展开▼
作者单位

Department of Electrical and Computer Engineering University of Stavanger, Norway;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;计算技术、计算机技术;
关键词
Clustering; K-means; Neural Gas; VM resources prediction; parallel K-means#x002B; #x002B;

机译：聚类； K-均值；神经气体； VM资源预测；并行K-均值++；;

相似文献

外文文献
中文文献
专利

1. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [J] . D. D. Shrimankar, S. R. Sathe Bioinformatics and Biology Insights . 2016,第Supplaa2期

机译：SMP节点和工作站集群上并行算法的并行编程模型与基于图块的大型生物数据集新方法并行分析
2. Analysis of Simple K- Mean and Parallel K- Mean Clustering for Software Products and Organizational Performance Using Education Sector Dataset [J] . Rui Shang, Balqees Ara, Islam Zada, Scientific programming . 2021,第a期

机译：用教育部门数据集分析软件产品和组织绩效的简单K-均值和平行k-均值聚类
3. Cloud-Based Grasp Analysis and Planning for Toleranced Parts Using Parallelized Monte Carlo Sampling [J] . Kehoe Ben, Warrier Deepak, Patil Sachin, Automation Science and Engineering, IEEE Transactions on . 2015,第2期

机译：使用并行蒙特卡洛采样基于云的公差分析和公差零件的计划
4. Cluster analysis for the cloud: Parallel Competitive Fitness and parallel K-means#x002B;#x002B; for large dataset analysis [C] . Esteves Rui Maximo, Hacker Thomas, Rang Chunming IEEE International Conference on Cloud Computing Technology and Science . 2012

机译：云集群分析：并行竞争健身和并行k-means＆＃x002b;＆＃x002b; 大型数据集分析
5. Analysis of a parallel stratiform mesoscale convective system during the midlatitude continental convective clouds experiment [D] . Neumann, Andrea J. 2012

机译：中际大陆对流云实验期间平行层状介质对流系统分析
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. Analysis of Simple K-Mean and Parallel K-Mean Clustering for Software Products and Organizational Performance Using Education Sector Dataset [O] . Rui Shang, Balqees Ara, Islam Zada, 2021

机译：使用教育部门数据集分析软件产品和组织绩效的简单K均值和平行k平均聚类

Cluster analysis for the cloud: Parallel Competitive Fitness and parallel K-means#x002B;#x002B; for large dataset analysis

摘要

著录项

相似文献

相关主题

期刊订阅