Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points

Oliver Chikumbo; Vincent Granville

首页> 外文期刊>Machine Learning and Knowledge Extraction >Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points

【24h】

Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points

机译：理解具有紧密分布点的高维数据空间时的最佳聚类和聚类身份

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The sensitivity of the elbow rule in determining an optimal number of clusters inhigh-dimensional spaces that are characterized by tightly distributed data points is demonstrated.The high-dimensional data samples are not artificially generated, but they are taken from a real worldevolutionary many-objective optimization. They comprise of Pareto fronts from the last 10 generationsof an evolutionary optimization computation with 14 objective functions. The choice for analyzingPareto fronts is strategic, as it is squarely intended to benefit the user who only needs one solutionto implement from the Pareto set, and therefore a systematic means of reducing the cardinality ofsolutions is imperative. As such, clustering the data and identifying the cluster from which to pickthe desired solution is covered in this manuscript, highlighting the implementation of the elbow ruleand the use of hyper-radial distances for cluster identity. The Calinski-Harabasz statistic was favoredfor determining the criteria used in the elbow rule because of its robustness. The statistic takes intoaccount the variance within clusters and also the variance between the clusters. This exercise alsoopened an opportunity to revisit the justification of using the highest Calinski-Harabasz criterionfor determining the optimal number of clusters for multivariate data. The elbow rule predictedthe maximum end of the optimal number of clusters, and the highest Calinski-Harabasz criterionmethod favored the number of clusters at the lower end. Both results are used in a unique wayfor understanding high-dimensional data, despite being inconclusive regarding which of the twomethods determine the true optimal number of clusters.

机译：证明了弯头规则在确定以紧密分布的数据点为特征的高维空间中的最佳聚类数时的敏感性。高维数据样本不是人为生成的，而是取自真实世界演化的多目标优化。它们由最近的10代Pareto前沿组成，具有14个目标函数。分析Pareto前沿的选择具有战略意义，因为它的目的是使只需要从Pareto集中实施一个解决方案的用户受益，因此，必须有一种减少解决方案基数的系统方法。这样，在本手稿中涵盖了对数据进行聚类并标识从中选择所需解决方案的聚类，重点介绍了肘部规则的实现以及使用超径向距离进行聚类标识。 Calinski-Harabasz统计量因其稳健性而被认为可用于确定肘部规则中使用的标准。该统计考虑了聚类内的方差以及聚类之间的方差。该练习还为重新探讨使用最高Calinski-Harabasz标准来确定多元数据的最佳聚类数的理由提供了机会。弯头规则预测了最佳簇数的最大端，而最高的Calinski-Harabasz判据法则偏向于较低端的簇数。尽管不确定这两种方法中的哪一种确定集群的最佳数量，但这两种结果都以独特的方式用于理解高维数据。

著录项

来源
《Machine Learning and Knowledge Extraction》 |2019年第2期|共30页
作者
Oliver Chikumbo; Vincent Granville;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
the elbow ruleCalinski-Harabasz criterionPareto frontevolutionary many-objective optimizationhigh-dimensional dataSammon’s nonlinear mappingclassical multi-dimensional scalinghyper-radial distance;

机译：肘规则Calinski-Harabasz准则帕累托前沿进化多目标优化高维数据萨蒙非线性映射经典多维缩放超径向距离;

相似文献

外文文献
中文文献
专利

1. Local-Density Subspace Distributed Clustering for High-Dimensional Data [J] . Geng Yangli-ao, Li Qingyong, Liang Mingfei, IEEE Transactions on Parallel and Distributed Systems . 2020,第8期

机译：用于高维数据的局部密度子空间分布式聚类
2. Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications (Advanced Science, Engineering and Medicine, Vol. 8(9), pp. 749–757 (2016)) [J] . Baghernia Ali, Pavin Hamid, Mirnabibaboli Miresmail, Advanced Science, Engineering and Medicine . 2017,第7期

机译：聚类高维数据流：生物信息学应用中预计集群的子空间聚类调查（高级科学，工程和医学，Vol.8（9），PP。749-757（2016））
3. ERRATUM: Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications [J] . Ali Baghernia, Hamid Pavin, Miresmail Mirnabibaboli, Advanced Science, Engineering and Medicine . 2017,第7期

机译：erratum：群集高维数据流：生物信息学应用中的子空间聚类调查，投影群集
4. Subspace search and visualization to make sense of alternative clusterings in high-dimensional data [C] . Tatu Andrada, Maas Fabian, Farber Ines, IEEE Conference on Visual Analytics Science amp; Technology 2012. . 2012

机译：子空间搜索和可视化，使高维数据中的替代聚类有意义
5. High-dimensional data mining: Subspace clustering, outlier detection and applications to classification. [D] . Foss, Andrew Philip Ogilvie. 2010

机译：高维数据挖掘：子空间聚类，离群值检测和分类应用。
6. Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data [O] . Burkhard Hoppenstedt, Manfred Reichert, Klaus Kammerer, 2019

机译：混合现实中的降维和子空间聚类用于高维生产数据的状态监测
7. Subspace Search and Visualization to Make Sense of Alternative Clusterings in High-Dimensional Data [O] . Tatu Andrada, Maaß Fabian, Färber Ines, 2012

机译：子空间搜索和可视化，使高维数据中的替代聚类有意义

Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points

摘要

著录项

相似文献

相关主题

期刊订阅