...
首页> 外文期刊>Information Sciences: An International Journal >Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling
【24h】

Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling

机译:基于广义Dirichlet混合建模的无监督聚类和特征加权

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We propose a possibilistic approach for Generalized Dirichlet mixture parameter estimation, data clustering, and feature weighting. The proposed algorithm, called Robust and Unsupervised Learning of Finite Generalized Dirichlet Mixture Models (RULe_GDM), exploits a property of the Generalized Dirichlet distributions that transforms the data to make the features independent and follow Beta distributions. Then, it learns optimal relevance weights for each feature within each cluster. This property makes RULe_GDM suitable for noisy and high-dimensional feature spaces. In addition, RULe_GDM associates two types of memberships with each data sample. The first one is the posterior probability and indicates how well a sample fits each estimated distribution. The second membership represents the degree of typicality and is used to identify and discard noise points and outliers. RULe_GDM minimizes one objective function which combines learning the two membership functions, distribution parameters, and the relevance weights for each feature within each distribution. We also extend our algorithm to find the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. The performance of RULe_GDM is illustrated and compared to similar algorithms. We use synthetic data to illustrate its robustness to noisy and high dimensional features. We also compare our approach to other relevant algorithms using several standard data sets.
机译:我们为广义Dirichlet混合参数估计,数据聚类和特征加权提出了一种可能的方法。所提出的算法称为有限广义Dirichlet混合模型的鲁棒和无监督学习(RULe_GDM),它利用了广义Dirichlet分布的一种特性,该特性转换数据以使特征独立并遵循Beta分布。然后,它为每个聚类中的每个特征学习最佳相关权重。此属性使RULe_GDM适用于嘈杂的高维特征空间。此外,RULe_GDM将两种类型的成员资格与每个数据样本相关联。第一个是后验概率,表示样本适合每个估计分布的程度。第二隶属度代表典型程度,用于识别和丢弃噪声点和离群值。 RULe_GDM最小化了一个目标函数,该目标函数结合了学习两个隶属函数,分布参数以及每个分布内每个特征的相关权重的功能。我们还扩展了算法,以通过利用可能性隶属函数的某些属性,以一种无监督且有效的方式找到最佳聚类数。展示了RULe_GDM的性能,并将其与类似算法进行了比较。我们使用合成数据来说明其对嘈杂和高维特征的鲁棒性。我们还将使用几种标准数据集将我们的方法与其他相关算法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号