聚类集成中基聚类的优化研究

程凯; 钟才明; 庞永明

首页> 中文期刊> 《计算机应用与软件》 >聚类集成中基聚类的优化研究

聚类集成中基聚类的优化研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cluster ensemble integrates the multiple partitions of a dataset into a new clustering,which discloses the cluster structure information of all the base clusters to the greatest extent.The qualities of base clusters are obviously crucial to the final ensemble result.K-means is one of the most used algorithms to produce base partitions,as it can be implemented easily and the corresponding computational cost is low,and furthermore,its clustering mechanism conforms to the assumption in machines learning that the class conditional probability of local data is a constant.But K-means usually adopts Gaussian distance as the distance measure,thus it can only find the clusters of spherical shape.It is also unable to generate high-quality base clusters when applied to datasets with complex structures,especially those whose class structures are not distributed spherically but based on connectivity.Therefore,this paper presents an optimization method for base clusters,namely,to judge the homogeneity of the clusters generated by K-means and partition those with poor homogeneity once again to improve the homogeneity.As a result,the quality of the entire cluster ensemble is improved.The experiments on 8 datasets demonstrate the effectiveness of the proposed method.%聚类集成是将一个数据集的多个划分(基聚类)合成一个新的聚类,该聚类最大程度地代表了所有输入基聚类对数据集的聚类信息.显而易见,初始基聚类的质量对于最终的集成划分至关重要.传统的聚类集成中的基聚类器使用最多的是K-means,因为K-means不仅实现简单,计算复杂度不高,而且其聚类机制符合机器学习关于局部数据的类别条件概率为常数的假设.但由于K-means通常直接使用高斯距离作为距离测度,其只能发现球形簇的类;而对于具有结构复杂、尤其是基于连接性且非球形分布的类结构的数据集,不能生成高质量(即同质性高)的基聚类.为此提出一个基聚类的优化方法,即:判定K-means所生成类的同质性,对同质性较差的类进行再次划分,以提高基聚类的同质性,从而提高整个聚类集成的质量.在8个数据集上的实验数据表明所提出的方法是有效的.

著录项

来源
《计算机应用与软件》 |2017年第9期|267-272|共6页
作者
程凯; 钟才明; 庞永明;
展开▼
作者单位

宁波大学信息科学与工程学院浙江宁波315210;

宁波大学科学技术学院信息工程学院浙江宁波315210;

宁波大学信息科学与工程学院浙江宁波315210;

展开▼
原文格式 PDF
正文语种 chi
中图分类人工智能理论;
关键词
聚类集成; K-means; 基聚类; 同质性; 伪高斯;

相似文献

中文文献
外文文献
专利

1. 面向聚类集成的基聚类三支筛选方法 [J] . 徐健锋 ,邹伟康 ,梁伟 . 计算机应用 . 2019,第011期
2. 基于谱聚类的聚类集成算法 [J] . 周林 ,平西建 ,徐森 . 自动化学报 . 2012,第008期
3. 使用谱聚类算法解决文本聚类集成问题 [J] . 徐森 ,卢志茂 ,顾国昌 . 通信学报 . 2010,第006期
4. 一种基于聚类集成技术的混合型数据聚类算法 [J] . 罗会兰 ,危辉 . 计算机科学 . 2010,第011期
5. 灰色聚类与模糊聚类集成诊断变压器内部故障的方法研究 [J] . 李俭 ,孙才新 ,陈伟根 . 中国电机工程学报 . 2003,第2期
6. 结合K均值与Laplacian的聚类集成算法 [C] . Xu Sen ,徐森 ,Zhou Tian . 2012中国计算机大会 . 2012
7. 聚类集成中基聚类器的优化研究 [A] . 程凯 . 2017

聚类集成中基聚类的优化研究

摘要

著录项

相似文献

相关主题

期刊订阅