首页> 外文学位 >New Development in Cluster Analysis and Other Related Multivariate Analysis Methods.
【24h】

New Development in Cluster Analysis and Other Related Multivariate Analysis Methods.

机译:聚类分析和其他相关多元分析方法的新发展。

获取原文
获取原文并翻译 | 示例

摘要

Cluster analysis is a multivariate analysis method aimed at (1) unraveling the natural groupings embedded within the data, and (2) dimension reduction. With the wide application of cluster analysis in the diversified modern research/business fields including machine learning, bioinformatics, medical image analysis, pattern recognition, market research and global climate research, many clustering algorithms have been developed to date. However, novel and/or special circumstances always call for better customized cluster analysis methods, and thus this thesis.;This thesis work consists of two parts. In the first part, we extend the modern multiple-objective cluster analysis from using a single set of features to multiple distinct sets of features by developing the novel compound clustering method and the constrained clustering method. We also developed a new statistic, the "complete linkage" R2 along with the well-known largest average silhouette, to determine the optimal number of clusters in the compound clustering. The novel compound/constrained clustering methods are illustrated through a gene microarray study with both gene expression data and gene function information.;In the second part of this thesis we propose a novel algorithm for the weighted kmeans clustering. Weighted k-means clustering is an extension of the k-means clustering in which a set of nonnegative weights are assigned to all the variables. We first derived the optimal variable weights for weighted k-means clustering in order to obtain more meaningful and interpretable clusters. We then improved the current weighted k-means clustering method (Huh and Lim 2009) by incorporating our novel algorithm to obtain global-optimal guaranteed variable weights based on the method of Lagrange multiplier and the Karush-Kuhn-Tucker conditions. Here we first present the related theoretical formulation and derivation of the optimal weights. Then we provide an iteration-based computing algorithm to calculate such optimal weights. Numerical examples on both simulated and well known real data are provided to illustrate our method. It is shown that our method outperforms the original proposed method in terms of classification accuracy, stability and computation efficiency.
机译:聚类分析是一种多元分析方法,旨在(1)揭示嵌入数据中的自然分组,以及(2)降维。随着聚类分析在机器学习,生物信息学,医学图像分析,模式识别,市场研究和全球气候研究等多元化的现代研究/商业领域中的广泛应用,迄今为止已经开发了许多聚类算法。然而,新颖的和/或特殊的情况总是要求更好的定制化聚类分析方法,因此,本论文也是如此。本文的工作由两部分组成。在第一部分中,我们通过开发新颖的复合聚类方法和约束聚类方法,将现代的多目标聚类分析从使用单个特征集扩展到多个不同的特征集。我们还开发了一种新的统计数据“完全链接” R2以及众所周知的最大平均轮廓,以确定复合聚类中的最佳聚类数。通过基因芯片研究,结合基因表达数据和基因功能信息,阐明了新的复合/约束聚类方法。在本文的第二部分,我们提出了一种新的加权kmeans聚类算法。加权k均值聚类是k均值聚类的扩展,其中将一组非负权重分配给所有变量。我们首先得出加权k均值聚类的最优可变权重,以获得更有意义和可解释的聚类。然后,我们通过结合Lagrange乘数法和Karush-Kuhn-Tucker条件的方法,结合新颖的算法来获得全局最优保证可变权重,从而改进了当前的加权k均值聚类方法(Huh和Lim 2009)。在这里,我们首先介绍相关的理论公式和最佳权重的推导。然后,我们提供了一种基于迭代的计算算法来计算此类最佳权重。提供了关于模拟数据和众所周知的真实数据的数值示例,以说明我们的方法。结果表明,我们的方法在分类精度,稳定性和计算效率方面均优于原始方法。

著录项

  • 作者

    Zhang, Shaonan.;

  • 作者单位

    State University of New York at Stony Brook.;

  • 授予单位 State University of New York at Stony Brook.;
  • 学科 Statistics.;Biostatistics.;Applied mathematics.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 121 p.
  • 总页数 121
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号