Ensemble Learning Based Distributed Clustering

机译：基于集成学习的分布式聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data mining techniques such as clustering are usually applied to centralized data sets. At present, more and more data is generated and stored in local sites. The transmission of the entire local data set to server is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. In this paper, we propose a distributed clustering model based on ensemble learning, which could analyze and mine distributed data sources to find global clustering patterns. A typical scenario of the distributed clustering is a 'two-stage' course, i.e. firstly doing clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. In the model, generating global patterns from ensemble is mathematically converted to be a combinatorial optimization problem. As an implementation for the model, a novel distributed clustering algorithm called DK-means is presented. Experimental results show that DK-means achieves similar results to K-means which clusters centralized data set at a time and is scalable to data distribution varying in local sites, and show validity of the model.

机译：诸如集群之类的数据挖掘技术通常应用于集中式数据集。当前，越来越多的数据被生成并存储在本地站点中。由于性能考虑，隐私和安全性以及带宽限制，将整个本地数据集传输到服务器通常是不可接受的。在本文中，我们提出了一种基于集成学习的分布式聚类模型，该模型可以分析和挖掘分布式数据源以找到全局聚类模式。分布式集群的典型场景是“两阶段”课程，即首先在本地站点中进行集群，然后在全局站点中进行集群。传输到服务器站点的本地聚类结果形成一个集合，并且集合学习的组合方案使用该集合来生成全局聚类结果。在模型中，从集合中生成全局模式在数学上转换为组合优化问题。作为该模型的实现，提出了一种称为DK-means的新型分布式聚类算法。实验结果表明，DK-means的效果与K-means相似，后者可以一次对集中的数据集进行聚类，并且可以扩展到本地站点中变化的数据分布，并显示了模型的有效性。

著录项

来源
《PAKDD(Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining) 2007 International Workshops; 20070522; Nanjing(CN)》|2007年|P.312321|共2页
会议地点 Nanjing(CN)
作者
Genlin Ji; Xiaohan Ling;
展开▼
作者单位

Department of Computer Science, Nanjing Normal University, Nanjing 210097, P.R. China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
distributed clustering; ensemble learning; data mining;

机译：分布式聚类；集成学习；数据挖掘;

相似文献

外文文献
中文文献
专利

1. Weighted clustering ensemble: Towards learning the weights of the base clusterings [J] . Baroudi Rouba, Safia Nait Bahloul Multiagent and grid systems . 2017,第4期

机译：加权聚类合奏：学习基础聚类的权重
2. Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning [J] . Pin Lim, Chi Keong Goh, Kay Chen Tan Cybernetics, IEEE Transactions on . 2017,第9期

机译：基于进化聚类的不平衡学习综合过采样集合（ECO集合）
3. A clustering ensemble learning method based on the ant colony clustering algorithm [J] . Hamid Parvin, Iman Jafari, Farhad Rad International Journal of Innovative Computing and Applications . 2017,第3期

机译：基于蚁群聚类算法的聚类集群学习方法
4. Ensemble Learning Based Distributed Clustering [C] . Genlin Ji, Xiaohan Ling PAKDD(Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining) 2007 International Workshops . 2007

机译：基于学习的分布式聚类
5. Relationship-based clustering and cluster ensembles for high-dimensional data mining. [D] . Strehl, Alexander. 2002

机译：用于高维数据挖掘的基于关系的聚类和聚类集成。
6. Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes [O] . Anna Jurek, Chris Nugent, Yaxin Bi, 2014

机译：基于聚类的集成学习在智能家居中的活动识别
7. K-metamodes: frequency-and ensemble-based distributed k-modes clustering for security analytics [O] . Andrey Sapegin, Christoph Meinel 2020

机译：K-Metamodes：用于安全分析的频率和基于频率的分布式k模式集群

Ensemble Learning Based Distributed Clustering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅