首页> 外文会议>SIAM International Conference on Data Mining >Using Low-Memory Representations to Cluster Very Large Data Sets
【24h】

Using Low-Memory Representations to Cluster Very Large Data Sets

机译:使用低内存表示群体非常大的数据集

获取原文

摘要

Many of the algorithms designed to cluster large data sets compute representations of the data which are based on a single vector, without a unique representation of the original data items. We present an extension of Principal Direction Divisive Partitioning which creates a least-squares approximation of the data based on a small number of vectors. We show that the extension can save significant amounts of memory and cluster the data as well as the original method. We also show that in some cases using more than one vector to approximate each data item results in superior quality clusterings.
机译:设计为群集大数据集的许多算法计算基于单个向量的数据的计算表示,而没有原始数据项的唯一表示。我们介绍了主方向分隔分区的扩展,其基于少量向量创建数据的最小二乘近似。我们表明扩展可以节省大量的内存并群集数据以及原始方法。我们还表明,在某些情况下,使用多个向量近似每个数据项会导致卓越的质量集群。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号