【24h】

Ensemble Learning Based Distributed Clustering

机译:基于集成学习的分布式聚类

获取原文
获取原文并翻译 | 示例

摘要

Data mining techniques such as clustering are usually applied to centralized data sets. At present, more and more data is generated and stored in local sites. The transmission of the entire local data set to server is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. In this paper, we propose a distributed clustering model based on ensemble learning, which could analyze and mine distributed data sources to find global clustering patterns. A typical scenario of the distributed clustering is a 'two-stage' course, i.e. firstly doing clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. In the model, generating global patterns from ensemble is mathematically converted to be a combinatorial optimization problem. As an implementation for the model, a novel distributed clustering algorithm called DK-means is presented. Experimental results show that DK-means achieves similar results to K-means which clusters centralized data set at a time and is scalable to data distribution varying in local sites, and show validity of the model.
机译:诸如集群之类的数据挖掘技术通常应用于集中式数据集。当前,越来越多的数据被生成并存储在本地站点中。由于性能考虑,隐私和安全性以及带宽限制,将整个本地数据集传输到服务器通常是不可接受的。在本文中,我们提出了一种基于集成学习的分布式聚类模型,该模型可以分析和挖掘分布式数据源以找到全局聚类模式。分布式集群的典型场景是“两阶段”课程,即首先在本地站点中进行集群,然后在全局站点中进行集群。传输到服务器站点的本地聚类结果形成一个集合,并且集合学习的组合方案使用该集合来生成全局聚类结果。在模型中,从集合中生成全局模式在数学上转换为组合优化问题。作为该模型的实现,提出了一种称为DK-means的新型分布式聚类算法。实验结果表明,DK-means的效果与K-means相似,后者可以一次对集中的数据集进行聚类,并且可以扩展到本地站点中变化的数据分布,并显示了模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号