首页> 外文期刊>Expert Systems with Application >A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional
【24h】

A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

机译:一种采用概率相异函数的具有混合数值和分类属性的数据聚类的模糊c均值类型算法

获取原文
获取原文并翻译 | 示例

摘要

Gath-Geva (GG) algorithm is one of the most popular methodologies for fuzzy c-means (FCM)-type clustering of data comprising numeric attributes; it is based on the assumption of data deriving from clusters of Gaussian form, a much more flexible construction compared to the spherical clusters assumption of the original FCM. In this paper, we introduce an extension of the GG algorithm to allow for the effective handling of data with mixed numeric and categorical attributes. Traditionally, fuzzy clustering of such data is conducted by means of the fuzzy k-prototypes algorithm, which merely consists in the execution of the original FCM algorithm using a different dissimilarity functional, suitable for attributes with mixed numeric and categorical attributes. On the contrary, in this work we provide a novel FCM-type algorithm employing a fully probabilistic dissimilarity functional for handling data with mixed-type attributes. Our approach utilizes a fuzzy objective function regularized by Kullback-Leibler (KL) divergence information, and is formulated on the basis of a set of probabilistic assumptions regarding the form of the derived clusters. We evaluate the efficacy of the proposed approach using benchmark data, and we compare it with competing fuzzy and non-fuzzy clustering algorithms.
机译:Gath-Geva(GG)算法是用于包含数字属性的数据的模糊c均值(FCM)类型聚类的最受欢迎的方法之一;它基于数据来源于高斯形式的聚类的假设,与原始FCM的球形聚类假设相比,该结构要灵活得多。在本文中,我们介绍了GG算法的扩展,以允许有效地处理具有混合数值和分类属性的数据。传统上,此类数据的模糊聚类是通过模糊k原型算法进行的,该算法仅包括使用不同的相异函数执行原始FCM算法,适用于具有混合数值和分类属性的属性。相反,在这项工作中,我们提供了一种新颖的FCM类型算法,该算法采用了完全概率相异函数来处理具有混合类型属性的数据。我们的方法利用了由Kullback-Leibler(KL)散度信息进行正则化的模糊目标函数,并且该方法是基于关于派生群集的形式的一组概率假设而制定的。我们使用基准数据评估该方法的有效性,并将其与竞争性模糊和非模糊聚类算法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号