Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces

机译：非矢量数据的数据泡沫：在任意度量空间中加速分层群集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To speed-up clustering algorithms, data summarization methods have been proposed, which first summarize the data set by computing suitable representative objects. Then, a clustering algorithm is applied to these representatives only, and a clustering structure for the whole data set is derived, based on the result for the representatives. Most previous methods are, however, limited in their application domain. They are in general based on sufficient statistics such as the linear sum of a set of points, which assumes that the data is from a vector space. On the other hand, in many important applications, the data is from a metric non-vector space, and only distances between objects can be exploited to construct effective data summarizations. In this paper, we develop a new data summarization method based only on distance information that can be applied directly to non-vector data. An extensive performance evaluation shows that our method is very effective in finding the hierarchical clustering structure of non-vector data using only a very small number of data summarizations, thus resulting in a large reduction of runtime while trading only very little clustering quality.

机译：为了加速聚类算法，已经提出了数据摘要方法，该方法首先通过计算合适的代表性对象来总结数据集。然后，仅将聚类算法应用于这些代表，并且基于代表的结果，导出整个数据集的聚类结构。然而，最先前的方法在其应用程序域中有限。它们通常基于足够的统计数据，例如一组点的线性和，这假设数据来自矢量空间。另一方面，在许多重要的应用中，数据来自度量非矢量空间，并且只能利用对象之间的距离来构建有效的数据摘要。在本文中，我们仅基于可以直接应用于非向量数据的距离信息进行新的数据摘要方法。广泛的性能评估表明，我们的方法在仅使用非常少量的数据摘要中找到非向量数据的分层聚类结构非常有效，从而导致运行时的大量减少，同时仅交易很少的聚类质量。

著录项

来源
《International conference on very large databases》|2003年||共12页
会议地点
作者
Jianjun Zhou; Joerg Sander;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化系统理论;
关键词

相似文献

外文文献
中文文献
专利

1. Detecting Clinically Meaningful Shape Clusters in Medical Image Data: Metrics Analysis for Hierarchical Clustering Applied to Healthy and Pathological Aortic Arches [J] . Jan L. Bruse, Maria A. Zuluaga, Abbas Khushnood, IEEE Transactions on Biomedical Engineering . 2017,第10期

机译：检测医学图像数据中具有临床意义的形状聚类：适用于健康和病理性主动脉弓的分层聚类的度量分析
2. Scalable Hyperspace Partitioning Based Data Preprocessing Algorithm for Distance-Metric Based Clustering in Data Mining [J] . Manju Pandey, Ravi K. Jade International Journal of Applied Engineering Research . 2017,第12aPta3期

机译：基于距离公制的数据挖掘距离的基于距离的基于距离的数据预处理算法
3. Space-Time Hierarchical Clustering for Identifying Clusters in Spatiotemporal Point Data [J] . Progress in Artificial Intelligence . 2020,第2期

机译：用于识别时空点数据中群集的时空分层聚类
4. Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces [C] . Jianjun Zhou, Joerg Sander Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany . 2003

机译：非矢量数据的数据气泡：任意度量空间中的分层聚类加速
5. Temporal Clustering of Finite Metric Spaces and Spectral k-Clustering [D] . Rossi, Alfred V. 2017

机译：有限度量空间的时间聚类和谱k聚类
6. The extension of the largest generalized-eigenvalue based distance metric Dij(γ1) in arbitrary feature spaces to classify composite data points [O] . Mosaab Daoud 2019

机译：基于最大广义特征值的距离度量Dij（γ1）在任意特征空间中的扩展以对复合数据点进行分类
7. Clustering Large Datasets in Arbitrary Metric Spaces [O] . 2013

机译：在任意度量空间中聚类大数据集
8. Clustering Large Datasets in Arbitrary Metric Spaces [R] . Ganti, V. , Ramakrishnan, R. , Gehrke, J. , 2006

机译：在任意度量空间中聚类大数据集

Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces

摘要

著录项

相似文献

相关主题

期刊订阅