首页> 外文学位 >New Development in Cluster Analysis and Other Related Multivariate Analysis Methods.

【24h】

New Development in Cluster Analysis and Other Related Multivariate Analysis Methods.

机译：聚类分析和其他相关多元分析方法的新发展。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Cluster analysis is a multivariate analysis method aimed at (1) unraveling the natural groupings embedded within the data, and (2) dimension reduction. With the wide application of cluster analysis in the diversified modern research/business fields including machine learning, bioinformatics, medical image analysis, pattern recognition, market research and global climate research, many clustering algorithms have been developed to date. However, novel and/or special circumstances always call for better customized cluster analysis methods, and thus this thesis.;This thesis work consists of two parts. In the first part, we extend the modern multiple-objective cluster analysis from using a single set of features to multiple distinct sets of features by developing the novel compound clustering method and the constrained clustering method. We also developed a new statistic, the "complete linkage" R2 along with the well-known largest average silhouette, to determine the optimal number of clusters in the compound clustering. The novel compound/constrained clustering methods are illustrated through a gene microarray study with both gene expression data and gene function information.;In the second part of this thesis we propose a novel algorithm for the weighted kmeans clustering. Weighted k-means clustering is an extension of the k-means clustering in which a set of nonnegative weights are assigned to all the variables. We first derived the optimal variable weights for weighted k-means clustering in order to obtain more meaningful and interpretable clusters. We then improved the current weighted k-means clustering method (Huh and Lim 2009) by incorporating our novel algorithm to obtain global-optimal guaranteed variable weights based on the method of Lagrange multiplier and the Karush-Kuhn-Tucker conditions. Here we first present the related theoretical formulation and derivation of the optimal weights. Then we provide an iteration-based computing algorithm to calculate such optimal weights. Numerical examples on both simulated and well known real data are provided to illustrate our method. It is shown that our method outperforms the original proposed method in terms of classification accuracy, stability and computation efficiency.

机译：聚类分析是一种多元分析方法，旨在（1）揭示嵌入数据中的自然分组，以及（2）降维。随着聚类分析在机器学习，生物信息学，医学图像分析，模式识别，市场研究和全球气候研究等多元化的现代研究/商业领域中的广泛应用，迄今为止已经开发了许多聚类算法。然而，新颖的和/或特殊的情况总是要求更好的定制化聚类分析方法，因此，本论文也是如此。本文的工作由两部分组成。在第一部分中，我们通过开发新颖的复合聚类方法和约束聚类方法，将现代的多目标聚类分析从使用单个特征集扩展到多个不同的特征集。我们还开发了一种新的统计数据“完全链接” R2以及众所周知的最大平均轮廓，以确定复合聚类中的最佳聚类数。通过基因芯片研究，结合基因表达数据和基因功能信息，阐明了新的复合/约束聚类方法。在本文的第二部分，我们提出了一种新的加权kmeans聚类算法。加权k均值聚类是k均值聚类的扩展，其中将一组非负权重分配给所有变量。我们首先得出加权k均值聚类的最优可变权重，以获得更有意义和可解释的聚类。然后，我们通过结合Lagrange乘数法和Karush-Kuhn-Tucker条件的方法，结合新颖的算法来获得全局最优保证可变权重，从而改进了当前的加权k均值聚类方法（Huh和Lim 2009）。在这里，我们首先介绍相关的理论公式和最佳权重的推导。然后，我们提供了一种基于迭代的计算算法来计算此类最佳权重。提供了关于模拟数据和众所周知的真实数据的数值示例，以说明我们的方法。结果表明，我们的方法在分类精度，稳定性和计算效率方面均优于原始方法。

著录项

作者
Zhang, Shaonan.;
展开▼
作者单位

State University of New York at Stony Brook.;

展开▼
授予单位 State University of New York at Stony Brook.;
学科 Statistics.;Biostatistics.;Applied mathematics.
学位 Ph.D.
年度 2011
页码 121 p.
总页数 121
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Application of k-means clustering, linear discriminant analysis and multivariate linear regression for the development of a predictive QSAR model on 5-lipoxygenase inhibitors [J] . Andrada Matias F., Vega-Hissi Esteban G., Estrada Mario R., Chemometrics and Intelligent Laboratory Systems . 2015,第Null期

机译：k均值聚类，线性判别分析和多元线性回归在建立5-脂氧合酶抑制剂预测QSAR模型中的应用
2. Scientific classification of ripening period and development of colour grade chart for Indian mangoes (Mangifera indica L.) using multivariate cluster analysis [J] . Nambi V. Eyarkai, Thangavel K., Jesudas D. Manohar Scientia horticulturae . 2015,第Null期

机译：利用多元聚类分析对印度芒果（Mangifera indica L.）的成熟期进行科学分类并绘制色级图
3. Multivariate Clustered Data Analysis in Developmental Toxicity Studies [J] . G. Molenberghs, H. Geys Statistica neerlandica . 2001,第3期

机译：发育毒性研究中的多元聚类数据分析
4. Cluster Feature based Multivariate Data Analysis and Recovery Method for Renewable Energy Operation and Control [C] . Yi Li, Tongxun WANG, Meng TAN, IEEE International Conference on Energy Internet . 2020

机译：基于群集的多变量数据分析和可再生能源操作和控制的恢复方法
5. Characterization of hypertension through multivariate analysis utilizing linear and nonlinear methods. [D] . Donnelly, Diane L. 2006

机译：通过使用线性和非线性方法的多元分析来表征高血压。
6. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. [O] . R R Gutell, A Power, G Z Hertz, 1992

机译：确定对RNA高阶结构的限制：持续发展和比较序列分析方法的应用。
7. Survival analysis part II: multivariate data analysis--an introduction to concepts and methods. [O] . Bradburn, MJ, Clark, TG, Love, SB, 2003

机译：生存分析第二部分：多元数据分析-概念和方法简介。
8. CONTRIBUTIONS TO MULTIVARIATE ANALYSIS INCLUDING UNIVARIATE AND MULTIVARIATE VARIANCE COMPONENTS ANALYSIS AND FACTOR ANALYSIS [R] . Ramanathan Gnanadesikan 1956

机译：多元分析的贡献，包括单因素和多元方差分析及因子分析

New Development in Cluster Analysis and Other Related Multivariate Analysis Methods.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅