首页> 外文会议>International Conference on Enterprise Systems >Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service
【24h】

Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service

机译:分层群集在可扩展的分布式双层数据存储中,用于大数据作为服务

获取原文

摘要

In this paper we propose a highly scalable approach to data clustering which may be applied in cloud-based big data services. We present a hierarchical approach to create an automatic data clustering in a Scalable Distributed Two-Layer Datastore (SD2DS) system by extending LH* schema so that it enables addressing data items based on their content. We achieved that with the bucket structure increase, the total clustering error decreases. Moreover, our method allows to incrementally add new data items to the structure and enables a parallel data processing. We carried out various simulations for 3 different cluster shapes and 5 different noise ratios to prove correctness of our solution. Additionally, we compare our solution with common clustering methods like K-means, Agglomerative and Birch.
机译:在本文中,我们提出了一种高度可扩展的数据聚类方法,该方法可以应用于基于云的大数据服务。我们介绍了一种分层方法,通过扩展LH *架构在可伸缩分布式的双层数据存储(SD2DS)系统中创建自动数据聚类,以便它能够根据其内容来解决数据项。我们实现了桶结构的增加,总集群误差减小。此外,我们的方法允许逐步向结构添加新数据项并启用并行数据处理。我们对3种不同的簇形状和5种不同的噪声比进行了各种模拟,以证明我们解决方案的正确性。此外,我们将通过普通的聚类方法进行比较,如K-means,凝聚和桦木。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号