A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

Muhammad Azhar; Mark Junjie Li; Joshua Zhexue Huang

首页> 外文期刊>Entropy >A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

【24h】

A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

机译：基于分层伽玛混合模型的高维数据分类方法

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimensional complex data with a large number of classes has been a great challenge for current state-of-the-art methods. This paper presents a novel, hierarchical, gamma mixture model-based unsupervised method for classifying high-dimensional data with a large number of classes. In this method, we first partition the features of the dataset into feature strata by using k-means. Then, a set of subspace data sets is generated from the feature strata by using the stratified subspace sampling method. After that, the GMM Tree algorithm is used to identify the number of clusters and initial clusters in each subspace dataset and passing these initial cluster centers to k-means to generate base subspace clustering results. Then, the subspace clustering result is integrated into an object cluster association (OCA) matrix by using the link-based method. The ensemble clustering result is generated from the OCA matrix by the k-means algorithm with the number of clusters identified by the GMM Tree algorithm. After producing the ensemble clustering result, the dominant class label is assigned to each cluster after computing the purity. A classification is made on the object by computing the distance between the new object and the center of each cluster in the classifier, and the class label of the cluster is assigned to the new object which has the shortest distance. A series of experiments were conducted on twelve synthetic and eight real-world data sets, with different numbers of classes, features, and objects. The experimental results have shown that the new method outperforms other state-of-the-art techniques to classify data in most of the data sets.

机译：数据分类是数据挖掘领域的重要研究课题。随着社交媒体站点和IoT设备的飞速发展，数据的数量和复杂性都得到了极大的增长，从而导致了许多大型和复杂的高维数据。对具有大量类的此类高维复杂数据进行分类，对于当前的最新方法是一项巨大的挑战。本文提出了一种新颖的，基于分层的，基于伽玛混合模型的无监督方法，用于对具有大量类的高维数据进行分类。在这种方法中，我们首先使用k均值将数据集的特征划分为特征层。然后，通过使用分层子空间采样方法从特征分层中生成一组子空间数据集。之后，使用GMM树算法识别每个子空间数据集中的聚类和初始聚类的数量，并将这些初始聚类中心传递给k-means，以生成基本子空间聚类结果。然后，通过使用基于链接的方法，将子空间聚类结果集成到对象聚类关联（OCA）矩阵中。通过k-means算法从OCA矩阵生成整体聚类结果，并用GMM Tree算法确定聚类的数量。产生整体聚类结果后，在计算纯度后将优势类标签分配给每个聚类。通过计算新对象与分类器中每个聚类的中心之间的距离对对象进行分类，并将聚类的类标签分配给距离最短的新对象。在十二个合成数据集和八个真实世界数据集上进行了一系列实验，这些数据集具有不同数量的类，特征和对象。实验结果表明，该新方法优于其他大多数将大多数数据集中的数据进行分类的技术。

著录项

来源
《Entropy》 |2019年第9期|共21页
作者
Muhammad Azhar; Mark Junjie Li; Joshua Zhexue Huang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生理学;
关键词
data miningunsupervised classificationdecision clustergamma mixture modelexpectation maximizationhigh-dimensional datacurse of dimensionality;

机译：数据挖掘无监督分类决策聚类伽玛混合模型期望最大化高维数据维;

相似文献

外文文献
中文文献
专利

1. A hierarchical Gamma Mixture Model-based method for estimating the number of clusters in complex data [J] . Applied Soft Computing . 2020,第期

机译：基于分层伽马混合模型的方法，用于估计复杂数据中的簇数
2. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions [J] . E Andres Houseman, Brock C Christensen, Ru-Fang Yeh, BMC Bioinformatics . 2008,第1期

机译：DNA甲基化阵列数据的基于模型的聚类：针对β分布混合出现的高维数据的递归划分算法
3. Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA [J] . Bellas A., Bouveyron C., Cottrell M., Advances in data analysis and classification . 2013,第3期

机译：基于模型的高维数据流与概率PCA在线混合的聚类
4. A hierarchical model-based approach to co-clustering high-dimensional data [C] . Gianni Costa, Giuseppe Manco, Riccardo Ortale ACM symposium on Applied computing . 2008

机译：基于层次模型的方法来共同聚类高维数据
5. Classification of High-dimensional Data Based on Multiple Testing Methods [D] . Ma, Chong. 2018

机译：基于多种测试方法的高维数据分类
6. A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data [O] . Muhammad Azhar, Mark Junjie Li, Joshua Zhexue Huang 2019

机译：基于分层伽马混合模型的高维数据分类方法
7. Model-based Clustering of Methylation Array Data: A Recursive-partitioning Algorithm for High-dimensional Data Arising as a Mixture of Beta Distributions [O] . Houseman E. Andres, Christensen Brock C., Yeh Ru-Fang, 2008

机译：甲基化阵列数据的基于模型的聚类：将高维数据作为Beta分布的混合而提出的递归分区算法

A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅