Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service

机译：分层群集在可扩展的分布式双层数据存储中，用于大数据作为服务

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we propose a highly scalable approach to data clustering which may be applied in cloud-based big data services. We present a hierarchical approach to create an automatic data clustering in a Scalable Distributed Two-Layer Datastore (SD2DS) system by extending LH* schema so that it enables addressing data items based on their content. We achieved that with the bucket structure increase, the total clustering error decreases. Moreover, our method allows to incrementally add new data items to the structure and enables a parallel data processing. We carried out various simulations for 3 different cluster shapes and 5 different noise ratios to prove correctness of our solution. Additionally, we compare our solution with common clustering methods like K-means, Agglomerative and Birch.

机译：在本文中，我们提出了一种高度可扩展的数据聚类方法，该方法可以应用于基于云的大数据服务。我们介绍了一种分层方法，通过扩展LH *架构在可伸缩分布式的双层数据存储（SD2DS）系统中创建自动数据聚类，以便它能够根据其内容来解决数据项。我们实现了桶结构的增加，总集群误差减小。此外，我们的方法允许逐步向结构添加新数据项并启用并行数据处理。我们对3种不同的簇形状和5种不同的噪声比进行了各种模拟，以证明我们解决方案的正确性。此外，我们将通过普通的聚类方法进行比较，如K-means，凝聚和桦木。

著录项

来源
《International Conference on Enterprise Systems》|2018年|226p|共8页
会议地点
作者
Adam Krechowicz; Stanis?aw Deniziak;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 F270.7-53;
关键词
Big Data; Distributed databases; Cloud computing; Tools; Data structures; Clustering methods;

机译：大数据;分布式数据库;云计算;工具;数据结构;聚类方法;

相似文献

外文文献
中文文献
专利

1. Load Balancing of Distributed Datastore in OpenDaylight Controller Cluster [J] . Kim Taehong, Myung Jungho, Yoo Seong-eun IEEE transactions on network and service management . 2019,第1期

机译：OpenDaylight Controller群集中分布式数据存储的负载平衡
2. Load Balancing of Distributed Datastore in OpenDaylight Controller Cluster [J] . Kim Taehong, Myung Jungho, Yoo Seong-eun IEEE transactions on network and service management . 2019,第1期

机译：OpenDaylight控制器群集中分布式数据存储的负载平衡
3. Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes [J] . Or Dinari, Oren Freifeld JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：通过Parentatile分层DireChlet进程中的并行和分布式采样可扩展和灵活的分组数据群集
4. Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service [C] . Adam Krechowicz, Stanisław Deniziak International Conference on Enterprise Systems . 2018

机译：大数据即服务的可扩展分布式两层数据存储中的分层集群
5. Functionally homogeneous clustering: A framework for building scalable data-intensive Internet services. [D] . Saito, Yasushi. 2001

机译：功能上同质的群集：用于构建可伸缩的数据密集型Internet服务的框架。
6. CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis [O] . Olga Permiakova, Romain Guibert, Alexandra Kraut, 2021

机译：鸡：通过Wassersein压缩等级簇分析从大规模质谱数据提取肽色谱摘谱曲线
7. Figure 2: Hierarchical maps using Ward as the clustering method and (A) raw data (B) scaled data, (C) data reduced by PCA and (D) data scaled and reduced by PCA. [O] . -1

机译：图2：使用Ward作为聚类方法的分层映射和（a）原始数据（b）缩放数据，（c）由PCA和（d）通过PCA缩放和减少的数据减少的数据。

Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service

摘要

著录项

相似文献

相关主题

期刊订阅