Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces

机译：非矢量数据的数据气泡：任意度量空间中的分层聚类加速

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

To speed-up clustering algorithms, data summarization methods have been proposed, which first summarize the data set by computing suitable representative objects. Then, a clustering algorithm is applied to these representatives only, and a clustering structure for the whole data set is derived, based on the result for the representatives. Most previous methods are, however, limited in their application domain. They are in general based on sufficient statistics such as the linear sum of a set of points, which assumes that the data is from a vector space. On the other hand, in many important applications, the data is from a metric non-vector space, and only distances between objects can be exploited to construct effective data summarizations. In this paper, we develop a new data summarization method based only on distance information that can be applied directly to non-vector data. An extensive performance evaluation shows that our method is very effective in finding the hierarchical clustering structure of non-vector data using only a very small number of data summarizations, thus resulting in a large reduction of runtime while trading only very little clustering quality.

机译：为了加快聚类算法的速度，人们提出了数据汇总方法，该方法首先通过计算合适的代表性对象来汇总数据集。然后，仅对这些代表应用聚类算法，并根据代表的结果得出整个数据集的聚类结构。但是，大多数先前的方法在其应用领域中受到限制。它们通常基于足够的统计信息，例如一组点的线性总和，它们假定数据来自矢量空间。另一方面，在许多重要应用中，数据来自度量非向量空间，并且只能利用对象之间的距离来构造有效的数据汇总。在本文中，我们开发了一种仅基于距离信息的新数据汇总方法，该方法可直接应用于非矢量数据。广泛的性能评估表明，我们的方法仅使用非常少量的数据摘要即可非常有效地找到非矢量数据的分层聚类结构，从而大大减少了运行时间，而交易的聚类质量却非常低。

著录项

来源
《Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany》|2003年|p.452-463|共12页
会议地点 Berlin(DE);Berlin(DE)
作者
Jianjun Zhou; Joerg Sander;
展开▼
作者单位

University of Alberta, Department of Computing Science Edmonton, Alberta, Canada T6G 2E8;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-26 14:15:35

相似文献

外文文献
中文文献
专利

1. Detecting Clinically Meaningful Shape Clusters in Medical Image Data: Metrics Analysis for Hierarchical Clustering Applied to Healthy and Pathological Aortic Arches [J] . Jan L. Bruse, Maria A. Zuluaga, Abbas Khushnood, IEEE Transactions on Biomedical Engineering . 2017,第10期

机译：检测医学图像数据中具有临床意义的形状聚类：适用于健康和病理性主动脉弓的分层聚类的度量分析
2. Scalable Hyperspace Partitioning Based Data Preprocessing Algorithm for Distance-Metric Based Clustering in Data Mining [J] . Manju Pandey, Ravi K. Jade International Journal of Applied Engineering Research . 2017,第12aPta3期

机译：基于距离公制的数据挖掘距离的基于距离的基于距离的数据预处理算法
3. Space-Time Hierarchical Clustering for Identifying Clusters in Spatiotemporal Point Data [J] . Progress in Artificial Intelligence . 2020,第2期

机译：用于识别时空点数据中群集的时空分层聚类
4. Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces [C] . Jianjun Zhou, Joerg Sander International conference on very large databases . 2003

机译：非矢量数据的数据泡沫：在任意度量空间中加速分层群集
5. Temporal Clustering of Finite Metric Spaces and Spectral k-Clustering [D] . Rossi, Alfred V. 2017

机译：有限度量空间的时间聚类和谱k聚类
6. The extension of the largest generalized-eigenvalue based distance metric Dij(γ1) in arbitrary feature spaces to classify composite data points [O] . Mosaab Daoud 2019

机译：基于最大广义特征值的距离度量Dij（γ1）在任意特征空间中的扩展以对复合数据点进行分类
7. Clustering Large Datasets in Arbitrary Metric Spaces [O] . 2013

机译：在任意度量空间中聚类大数据集
8. Clustering Large Datasets in Arbitrary Metric Spaces [R] . Ganti, V. , Ramakrishnan, R. , Gehrke, J. , 2006

机译：在任意度量空间中聚类大数据集

Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces

摘要

著录项

相似文献

相关主题

期刊订阅