Probabilistic model-based clustering of complex data.

机译：基于概率模型的复杂数据聚类。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In many emerging data mining applications, one needs to cluster complex data such as very high-dimensional sparse text documents and continuous or discrete time sequences. Probabilistic model-based clustering techniques have shown promising results in many such applications. For real-valued low-dimensional vector data, Gaussian models have been frequently used. For very high-dimensional vector and non-vector data, model-based clustering is a natural choice when it is difficult to extract good features or identify an appropriate measure of similarity between pairs of data objects.; This dissertation presents a unified framework for model-based clustering based on a bipartite graph view of data and models. The framework includes an information-theoretic analysis of model-based partitional clustering from a deter ministic annealing point of view and a view of model-based hierarchical clustering that leads to several useful extensions. The framework is used to develop two new variations of model-based clustering—a balanced model-based partitional clustering algorithm that produces clusters of comparable sizes and a hybrid model-based clustering approach that combines the advantages of partitional and hierarchical model-based algorithms.; I apply the framework and new clustering algorithms to cluster several distinct types of complex data, ranging from arbitrary-shaped 2-D synthetic data to high dimensional documents, EEG time series, and gene expression time sequences. The empirical results demonstrate the usefulness of the scalable, balanced model-based clustering algorithms, as well as the benefits of the hybrid model-based clustering approach. They also showcase the generality of the proposed clustering framework.

机译：在许多新兴的数据挖掘应用程序中，需要对复杂的数据进行聚类，例如超高维稀疏文本文档以及连续或离散的时间序列。基于概率模型的聚类技术已在许多此类应用中显示出令人鼓舞的结果。对于实值低维矢量数据，经常使用高斯模型。对于非常高维的矢量和非矢量数据，当难以提取良好特征或确定数据对象对之间的相似性的适当度量时，基于模型的聚类是一种自然的选择。本文基于数据和模型的二部图视图，提出了一个统一的基于模型的聚类框架。该框架包括从确定性退火的角度对基于模型的分区聚类进行信息理论分析，并从基于模型的分层聚类的角度进行了一些有用的扩展。该框架用于开发基于模型的聚类的两个新变体–一种平衡的基于模型的分区聚类算法，该算法可生成可比较大小的聚类；以及一种基于混合模型的聚类方法，结合了分区和基于层次模型的算法的优点。 ;我应用框架和新的聚类算法对几种不同类型的复杂数据进行聚类，范围从任意形状的二维合成数据到高维文档，EEG时间序列和基因表达时间序列。实证结果证明了可伸缩的，基于平衡模型的聚类算法的有用性，以及基于混合模型的聚类方法的好处。它们还展示了建议的群集框架的一般性。

著录项

作者
Zhong, Shi.;
展开▼
作者单位

The University of Texas at Austin.;

展开▼
授予单位 The University of Texas at Austin.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2003
页码 p.6260
总页数 183
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Variable selection for model-based high-dimensional clustering and its application to microarray data. [J] . Wang S, Zhu J Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2008,第2期

机译：基于模型的高维聚类的变量选择及其在微阵列数据中的应用。
2. Model-based clustering of meta-analytic functional imaging data. [J] . Neumann J, von-Cramon DY, Lohmann G Human brain mapping . 2008,第2期

机译：基于模型的荟萃分析功能成像数据的聚类。
3. Model-based clustering and data transformations for gene expression data. [J] . Yeung KY, Fraley C, Murua A, Bioinformatics . 2001,第10期

机译：基因表达数据的基于模型的聚类和数据转换。
4. Model-based Clustering With Probabilistic Constraints [C] . Martin H. C. Law, Alexander Topchy, Anil K. Jain SIAM International Conference on Data Mining . 2005

机译：基于模型的概率约束群集
5. Real-time probabilistic contaminant source identification and model-based event detection algorithms. [D] . Yang, Xueyao. 2013

机译：实时概率污染源识别和基于模型的事件检测算法。
6. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering [O] . Bie Verbist, Lieven Clement, Joke Reumers, 2015

机译：ViVaMBC：使用基于模型的聚类分析从照明深度测序数据估算复杂人群中的病毒序列变化
7. Model-based regression clustering for high-dimensional data. Application to functional data [O] . Devijver, Emilie 2016

机译：基于模型的回归聚类用于高维数据。应用功能数据
8. Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses [R] . Foley, D. H., Bryan, J. H., Wilkerson, R. C. 2007

机译：通过树和基于模型的等位酶聚类分析揭示的按蚊（双翅目：蚊科）的物种丰富度

Probabilistic model-based clustering of complex data.

摘要

著录项

相似文献

相关主题

期刊订阅