首页> 外文会议>2011 IEEE Symposium on Computational Intelligence and Data Mining >Clustering categorical data: A stability analysis framework
【24h】

Clustering categorical data: A stability analysis framework

机译:聚类分类数据:稳定性分析框架

获取原文

摘要

Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation of the mean. This is often the case with real-world datasets, for instance in the domain of Public Health, resulting in solutions that can be radically different depending on the initialization and therefore lead to different interpretations. This paper presents two methodologies. The first addresses sensitivity to initializations using a generic landscape mapping of k-mode solutions. The second methodology utilizes the landscape map to stabilize the partition clusters for discrete data, by drawing a consensus sample in order to separate signal from noise components. Results are presented for the benchmark soybean disease dataset, an artificially generated dataset and a case study involving Public Health data.
机译:聚类以识别固有结构是数据探索中重要的第一步。 k均值算法是一种流行的选择,但K均值通常不适用于分类数据。用于分类数据的k均值的特定扩展是k模式算法。这两种分区聚类方法都对原型的初始化很敏感,这给选择给定问题的最佳解决方案带来了困难。此外,选择群集数量可能是一个问题。此外,k模式方法在出现“嘈杂”数据时特别容易出现不稳定现象,因为该模式的计算缺乏均值计算固有的平滑效果。现实世界中的数据集通常是这种情况,例如在公共卫生领域,所产生的解决方案可能会因初始化而有根本不同,因此导致不同的解释。本文介绍了两种方法。第一个解决方案使用k模式解决方案的通用格局映射解决了对初始化的敏感性。第二种方法是通过绘制一个共识样本,以便将信号与噪声分量分开,利用景观图来稳定离散数据的分区簇。给出了基准大豆疾病数据集,人工生成的数据集和涉及公共卫生数据的案例研究的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号