首页> 外文会议>Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany >Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces
【24h】

Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces

机译:非矢量数据的数据气泡:任意度量空间中的分层聚类加速

获取原文
获取原文并翻译 | 示例

摘要

To speed-up clustering algorithms, data summarization methods have been proposed, which first summarize the data set by computing suitable representative objects. Then, a clustering algorithm is applied to these representatives only, and a clustering structure for the whole data set is derived, based on the result for the representatives. Most previous methods are, however, limited in their application domain. They are in general based on sufficient statistics such as the linear sum of a set of points, which assumes that the data is from a vector space. On the other hand, in many important applications, the data is from a metric non-vector space, and only distances between objects can be exploited to construct effective data summarizations. In this paper, we develop a new data summarization method based only on distance information that can be applied directly to non-vector data. An extensive performance evaluation shows that our method is very effective in finding the hierarchical clustering structure of non-vector data using only a very small number of data summarizations, thus resulting in a large reduction of runtime while trading only very little clustering quality.
机译:为了加快聚类算法的速度,人们提出了数据汇总方法,该方法首先通过计算合适的代表性对象来汇总数据集。然后,仅对这些代表应用聚类算法,并根据代表的结果得出整个数据集的聚类结构。但是,大多数先前的方法在其应用领域中受到限制。它们通常基于足够的统计信息,例如一组点的线性总和,它们假定数据来自矢量空间。另一方面,在许多重要应用中,数据来自度量非向量空间,并且只能利用对象之间的距离来构造有效的数据汇总。在本文中,我们开发了一种仅基于距离信息的新数据汇总方法,该方法可直接应用于非矢量数据。广泛的性能评估表明,我们的方法仅使用非常少量的数据摘要即可非常有效地找到非矢量数据的分层聚类结构,从而大大减少了运行时间,而交易的聚类质量却非常低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号