首页> 外文学位 >High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products.

【24h】

High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products.

机译：高维数据聚类和基于聚类的数据汇总产品的统计分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the advancement of modern technology, we have seen the expansion of data in two dimensions: number of variables and number of observations. Such high-dimensionality and large data volume have posed new challenges to statistical analysis. This thesis considers two problems related to cluster analysis: high-dimensional data clustering and statistical analysis of clustering-based data summarization products.;High-dimensionality often makes traditional clustering methods ineffective. Variable selection is a common approach to reduce the dimensionality of data for better cluster analysis. Most of recently developed methods either explicitly or implicitly perform variable selection based on variable importance ( VI) measure. In this thesis, an algorithmic framework is introduced which iterates between constructing VI and performing variable selection conditioning on each other. Within this framework, we develop an ensemble VI which is constructed by averaging a set of VI's. Both theoretical and simulation studies show that the proposed ensemble VI has better variable selection performance than unensemble VI and is robust to the choice of the number of groups in cluster analysis. In addition to the development in VI, we propose a new VI-based variable selection method which selects a set of variables through sequentially testing the existence of group structure in data. Its effectiveness is demonstrated through simulation study and a real data application.;In the second problem, we consider a histogram type of data summarization product from massive climate data. Unlike traditional data reduction method which summarizes the observations in each spatial grid-box during a period of time by their average, a NASA science team recently uses a multivariate histogram, constructed by cluster analysis, to represent those observations. This method has been applied to the observations collected by Atmospheric Infrared Sounder (AIRS) to produce AIRS L3Q products. In this thesis, we study potential statistical tools using pairwise dissimilarities that are suitable for analyzing this histogram type of data. Through theoretical analysis and simulations, we investigate several different dissimilarity measures and find Mallows distance is preferable over others when the locations of the representative vectors are important for the analysis. We apply MultiDimensional Scaling and clustering method to analyze the AIRS data collected in December 2002. The results from these studies show the effectiveness of statistical methods based on Mallows distance in extracting information from this histogram type of data.

机译：随着现代技术的进步，我们已经看到了数据在两个维度上的扩展：变量数量和观测值数量。这种高维度和大数据量给统计分析带来了新的挑战。本文考虑了与聚类分析有关的两个问题：高维数据聚类和基于聚类的数据汇总产品的统计分析。；高维通常使传统聚类方法无效。变量选择是减少数据维数以进行更好的聚类分析的常用方法。最近开发的大多数方法都基于变量重要性（VI）度量来显式或隐式地执行变量选择。本文介绍了一种算法框架，该框架在构造VI和相互执行变量选择条件之间进行迭代。在此框架内，我们开发了一个集成VI，该VI是通过对一组VI求平均值而构建的。理论和仿真研究均表明，所提出的集成VI比非集成VI具有更好的变量选择性能，并且在聚类分析中对组数的选择具有鲁棒性。除了在VI中进行开发外，我们还提出了一种基于VI的新变量选择方法，该方法通过依次测试数据中组结构的存在来选择一组变量。通过仿真研究和实际数据应用证明了其有效性。在第二个问题中，我们考虑了来自大量气候数据的直方图类型的数据汇总产品。传统的数据约简方法不像传统的数据约简方法那样，将一段时间内每个空间网格中的观测值平均起来，而NASA科学团队最近使用通过聚类分析构建的多元直方图来表示这些观测值。此方法已应用于大气红外测深仪（AIRS）收集的观测结果，以生产AIRS L3Q产品。在本文中，我们使用成对的差异研究适合于分析这种直方图类型数据的潜在统计工具。通过理论分析和模拟，我们研究了几种不同的差异度量，并发现当代表性矢量的位置对于分析很重要时，Mallows距离比其他方法更可取。我们使用多维标度和聚类方法来分析2002年12月收集的AIRS数据。这些研究的结果表明，基于Mallows距离的统计方法可以从这种直方图类型的数据中提取信息。

著录项

作者
Zhou, Dunke.;
展开▼
作者单位

The Ohio State University.;

展开▼
授予单位 The Ohio State University.;
学科 Statistics.
学位 Ph.D.
年度 2012
页码 123 p.
总页数 123
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:43:31

相似文献

外文文献
中文文献
专利

1. Analysis of Clinical Flow Cytometric Immunophenotyping Data by Clustering on Statistical Manifolds: Treating Flow Cytometry Data as High-Dimensional Objects [J] . Finn WG, Carter KM, Raich R, Cytometry, Part B. Clinical cytometry: the journal of the International Society for Analytical Cytology . 2009,第1期

机译：通过统计流形上的聚类分析临床流式细胞免疫分型数据：将流式细胞术数据视为高维对象
2. Analysis of Clinical Flow Cytometric Immunophenotyping Data by Clustering on Statistical Manifolds: Treating Flow Cytometry Data as High-Dimensional Objects [J] . Finn WG, Carter KM, Raich R, Cytometry, Part B. Clinical cytometry: the journal of the International Society for Analytical Cytology . 2009,第1期

机译：通过统计流形上的聚类分析临床流式细胞免疫分型数据：将流式细胞术数据视为高维对象
3. A Clustering-Based Bipartite Graph Privacy-Preserving Approach for Sharing High-Dimensional Data [J] . Li-e Wang, Xianxian Li International journal of software engineering and knowledge engineering . 2014,第7期

机译：基于聚类的二分图隐私保护共享高维数据的方法
4. Scalable Clustering for Large High-Dimensional Data Based on Data Summarization [C] . Ying Lai, Orlandic, R., . 2007

机译：基于数据汇总的大型高维数据可伸缩聚类
5. The use of data topology in unsupervised clustering of high-dimensional data with self-organizing maps. [D] . Tasdemir, Kadim. 2008

机译：在具有自组织映射的高维数据的无监督聚类中使用数据拓扑。
6. Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field [O] . Jing Tian, Bing Yu, Dan Yu, 2013

机译：灰色关联分析的聚类缺失数据插补及其在航空航天中的应用
7. Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimensional objects How to cite this article: Finn WG, Carter KM, Raich R, Stoolman LM, Hero AO. Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimensional objects. Cytometry Part B 2009; 76B: 1–7. [O] . Finn, William G., Carter, Kevin M., Raich, Raviv, 2009

机译：通过聚类统计流形分析临床流式细胞免疫表型数据：将流式细胞术数据作为高维物体处理如何引用本文：Finn WG，Carter Km，Raich R，stoolman Lm，Hero aO。通过聚类在统计流形上分析临床流式细胞免疫表型分析数据：将流式细胞术数据作为高维物体处理。细胞计数B部分2009; 76B：1-7。
8. Formation of Parametric Images in Positron Emission Tomography Using a Clustering-Based Kinetic Analysis With Statistical Clustering. [R] . Kimura, Y., Noshi, Y., Oda, K., 2001

机译：利用基于聚类的动力学分析统计聚类在正电子发射层析成像中形成参数图像。

High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products.

摘要

著录项

相似文献

相关主题

期刊订阅