Clustering Large Categorical Data

机译：聚类大型分类数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering methods often come down to the optimization of a numeric criterion defined from a distance or from a dissimilarity measure. It is possible to show that this problem is often equivalent to the estimation of the parameters of a probabilistic model under the classification likelihood approach. For instance, we know that the inertia criterion optimized under the k-means algorithm corresponds to the hypothesis of a population arising from a Gaussian mixture. In this paper, we propose an adapted mixture model for categorical data. Using the classification likelihood approach, we develop the Classification EM algorithm (CEM) to estimate the parameters of the mixture model. With our probabilistic model, the data are not denatured and the estimated parameters readily indicate the characteristics of the clusters. This probabilistic approach gives an interpretation of the criterion optimized by the k-modes algorithm which is an extension of k-means to categorical attributes and allows us to study the behavior of this algorithm.

机译：聚类方法通常归结为根据距离或差异度量定义的数字标准的优化。可能表明，该问题通常等同于在分类似然法下对概率模型参数的估计。例如，我们知道在k均值算法下优化的惯性准则对应于高斯混合产生的总体假设。在本文中，我们提出了一种适用于分类数据的混合模型。使用分类可能性方法，我们开发了分类EM算法（CEM）来估计混合模型的参数。使用我们的概率模型，数据不会被变性，估计的参数很容易表明聚类的特征。这种概率方法可以解释由k-modes算法优化的标准，它是k-means对分类属性的扩展，可以让我们研究该算法的行为。

著录项

来源
《》|2002年|p.257-263|共7页
会议地点
作者
Francois-Xavier Jollois; Mohamed Nadif;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining [J] . Sengottaian Sarumathi, Natesan Shanthi, Mathivanan Sharmila The international arab journal of information technology . 2017,第3期

机译：数据挖掘中分类数据聚类的加权增量因子聚类集成算法
2. An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data [J] . Liang Bai, Jiye Liang, Chuangyin Dang Knowledge-Based Systems . 2011,第6期

机译：同时查找初始聚类中心和聚类数量以聚类分类数据的初始化方法
3. High-performance link-based cluster ensemble approach for categorical data clustering [J] . Yuvaraj N., Dhas C. Suresh Ghana Journal of supercomputing . 2020,第6期

机译：基于高性能链接的集群集成方法，用于分类数据群集
4. A Data Labeling Method for Categorical Data Clustering Using Cluster Entropies in Rough Sets [C] . Reddy H.Venkateswara, Kumar B.Suresh, Viswanadharaju S. International Conference on Communication Systems and Network Technologies . 2014

机译：粗糙集聚类熵的分类数据聚类数据标注方法
5. Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement. [D] . Foss, Andrew Philip Ogilvie. 2002

机译：通过连续的分辨率优化自动分类数据聚类和空间数据聚类。
6. Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset [O] . Amir Ahmad 2016

机译：改进的分类数据模糊聚类算法对威斯康星州乳腺癌数据集的评估
7. Comparing of EA K- modes clustering and NBEA K - modes clustering , A new method for clustering categorical data applying them on the injecting drug users data set [O] . Zamani Nasab Zahra 2017

机译：EA K-模式聚类和NBEA K-模式聚类的比较，一种将分类数据应用于注射毒品使用者数据集的新方法

Clustering Large Categorical Data

摘要

著录项

相似文献

相关主题

期刊订阅