Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding

Murtagh F; Downs G; Contreras P

首页> 外文期刊>SIAM Journal on Scientific Computing >Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding

【24h】

Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding

机译：通过利用超度量嵌入对海量高维数据集进行分层聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Coding of data, usually upstream of data analysis, has crucial implications for the data analysis results. By modifying the data coding-through use of less than full precision in data values-we can aid appreciably the e. ectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierarchically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second application, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of O(nlogn) complexity, to k-means.

机译：数据编码，通常是数据分析的上游，对数据分析结果具有至关重要的意义。通过修改数据编码（通过在数据值中使用小于完全精度的数据），我们可以大大帮助e。聚类的有效性和效率。在我们的第一个应用程序中，这用于减少要分层聚类的数据量。该方法是一种混合方法，基于哈希和Ward最小方差聚集标准。在我们的第二个应用程序中，我们从一组观察值之间的关系中获得了层次聚类，而不是从观察值本身之间的关系的传统用法中得出。第二个应用程序使用Baire空间或最长公共前缀超度量空间中的嵌入。我们将O（nlogn）复杂度的第二种方法与k均值进行比较。

著录项

来源
《SIAM Journal on Scientific Computing》 |2009年第2期|共24页
作者
Murtagh F; Downs G; Contreras P;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算数学;
关键词
hierarchical clustering; ultrametric; tree distance; partitioning; hashing;

机译：层次聚类;超度量;树距;分区;散列;

相似文献

外文文献
中文文献
专利

1. Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding [J] . Murtagh F, Downs G, Contreras P SIAM Journal on Scientific Computing . 2009,第2期

机译：通过利用超度量嵌入对海量高维数据集进行分层聚类
2. Hierarchical Clustering Of Subpopulations With A Dissimilarity Based On The Likelihood Ratio Statistic: Application To Clustering Massive Data Sets [J] . Antonio Ciampi, Yves Lechevallier, Manuel Castejon Limas, Pattern Analysis and Applications . 2008,第2期

机译：基于似然比统计量的具有相似性的子种群的分层聚类：在聚类大量数据中的应用
3. Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval Using the Baire Metric, with Linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement [J] . F. Murtagh, P. Contreras P-adic numbers, ultrametric analysis and applications . 2012,第1期

机译：使用Baire指标进行搜索的快速，线性时间，m-Adic层次聚类，并与广义超测，散列，形式概念分析和数据测量精度相关联
4. Cluster Center Initialization Using Hierarchical Two-Division of a Data Set along Each Dimension [C] . Guang Hui Chen Advances in computer science and information engineering . 2012

机译：使用沿每个维度的数据集的分层两部分进行集群中心初始化
5. Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters. [D] . Dashti, Ali. 2013

机译：有效计算gpu群集上的大型高维数据集的k最近邻图。
6. Blind method for discovering number of clusters in multidimensional datasets by regression on linkage hierarchies generated from random data [O] . Osbert C. Zalay 2020

机译：通过从随机数据生成的链接层次结构上的回归在多维数据集中发现多维数据集数量的盲方法
7. Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding [O] . Murtagh Fionn, Contreras Albornoz Pedro, Downs Geoff 2008

机译：通过利用超度量嵌入对海量高维数据集进行分层聚类
8. Statistical Analysis of Very High-Dimensional Data Sets of Hierarchically Structured Binary Variables with Missing Data and Application to Marine Corps Readiness Evaluations [R] . Zacks, S., Marlow, W. H., Brier, S. S. 1983

机译：具有缺失数据的分层结构二元变量的超高维数据集的统计分析及其在海军陆战队准备评估中的应用

Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding

摘要

著录项

相似文献

相关主题

期刊订阅