首页> 外文学位 >Probabilistic model-based clustering of complex data.
【24h】

Probabilistic model-based clustering of complex data.

机译:基于概率模型的复杂数据聚类。

获取原文
获取原文并翻译 | 示例

摘要

In many emerging data mining applications, one needs to cluster complex data such as very high-dimensional sparse text documents and continuous or discrete time sequences. Probabilistic model-based clustering techniques have shown promising results in many such applications. For real-valued low-dimensional vector data, Gaussian models have been frequently used. For very high-dimensional vector and non-vector data, model-based clustering is a natural choice when it is difficult to extract good features or identify an appropriate measure of similarity between pairs of data objects.; This dissertation presents a unified framework for model-based clustering based on a bipartite graph view of data and models. The framework includes an information-theoretic analysis of model-based partitional clustering from a deter ministic annealing point of view and a view of model-based hierarchical clustering that leads to several useful extensions. The framework is used to develop two new variations of model-based clustering—a balanced model-based partitional clustering algorithm that produces clusters of comparable sizes and a hybrid model-based clustering approach that combines the advantages of partitional and hierarchical model-based algorithms.; I apply the framework and new clustering algorithms to cluster several distinct types of complex data, ranging from arbitrary-shaped 2-D synthetic data to high dimensional documents, EEG time series, and gene expression time sequences. The empirical results demonstrate the usefulness of the scalable, balanced model-based clustering algorithms, as well as the benefits of the hybrid model-based clustering approach. They also showcase the generality of the proposed clustering framework.
机译:在许多新兴的数据挖掘应用程序中,需要对复杂的数据进行聚类,例如超高维稀疏文本文档以及连续或离散的时间序列。基于概率模型的聚类技术已在许多此类应用中显示出令人鼓舞的结果。对于实值低维矢量数据,经常使用高斯模型。对于非常高维的矢量和非矢量数据,当难以提取良好特征或确定数据对象对之间的相似性的适当度量时,基于模型的聚类是一种自然的选择。本文基于数据和模型的二部图视图,提出了一个统一的基于模型的聚类框架。该框架包括从确定性退火的角度对基于模型的分区聚类进行信息理论分析,并从基于模型的分层聚类的角度进行了一些有用的扩展。该框架用于开发基于模型的聚类的两个新变体–一种平衡的基于模型的分区聚类算法,该算法可生成可比较大小的聚类;以及一种基于混合模型的聚类方法,结合了分区和基于层次模型的算法的优点。 ;我应用框架和新的聚类算法对几种不同类型的复杂数据进行聚类,范围从任意形状的二维合成数据到高维文档,EEG时间序列和基因表达时间序列。实证结果证明了可伸缩的,基于平衡模型的聚类算法的有用性,以及基于混合模型的聚类方法的好处。它们还展示了建议的群集框架的一般性。

著录项

  • 作者

    Zhong, Shi.;

  • 作者单位

    The University of Texas at Austin.;

  • 授予单位 The University of Texas at Austin.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 p.6260
  • 总页数 183
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号