Using Low-Memory Representations to Cluster Very Large Data Sets

机译：使用低内存表示群体非常大的数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many of the algorithms designed to cluster large data sets compute representations of the data which are based on a single vector, without a unique representation of the original data items. We present an extension of Principal Direction Divisive Partitioning which creates a least-squares approximation of the data based on a small number of vectors. We show that the extension can save significant amounts of memory and cluster the data as well as the original method. We also show that in some cases using more than one vector to approximate each data item results in superior quality clusterings.

机译：设计为群集大数据集的许多算法计算基于单个向量的数据的计算表示，而没有原始数据项的唯一表示。我们介绍了主方向分隔分区的扩展，其基于少量向量创建数据的最小二乘近似。我们表明扩展可以节省大量的内存并群集数据以及原始方法。我们还表明，在某些情况下，使用多个向量近似每个数据项会导致卓越的质量集群。

著录项

来源
《SIAM International Conference on Data Mining》|2003年|xiv 347 p.|共5页
会议地点
作者
David Littau; Daniel Boley;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Clustering; Large data sets; PDDP; Data mining; Principal directions; Matrix approximation;

机译：聚类;大数据集;PDDP;数据挖掘;主路线;矩阵近似;

相似文献

外文文献
中文文献
专利

1. Streaming data reduction using low-memory factored representations [J] . Littau D, Boley D Information Sciences: An International Journal . 2006,第14期

机译：使用低内存分解表示简化流数据
2. Gradual representation of shadowed set for clustering gene expression data [J] . Bose Ankita, Mali Kalyani Applied Soft Computing . 2019,第期

机译：集群基因表达数据逐渐表示的阴影集
3. CLUSTERING VERY LARGE DATA SETS USING A LOW MEMORY MATRIX FACTORED REPRESENTATION [J] . David Littau, Daniel Boley Computational Intelligence . 2009,第2期

机译：使用低内存矩阵因子表示法聚集非常大的数据集
4. Using Low-Memory Representations to Cluster Very Large Data Sets [C] . David Littau, Daniel Boley SIAM International Conference on Data Mining . 2003

机译：使用低内存表示群体非常大的数据集
5. Using a low-memory factored representation to data mine large data sets. [D] . Littau, David William. 2005

机译：使用低内存分解表示法来挖掘大型数据集。
6. The Effect of Species Representation on the Detection of Positive Selection in Primate Gene Data Sets [O] . Ross M. McBee, Shea A. Rozmiarek, Nicholas R. Meyerson, -1

机译：物种表示对灵长类动物基因数据集中阳性选择检测的影响
7. Using Low-Memory Representations to Cluster Very Large Data Sets [O] . David Littau, Daniel Boley 2003

机译：使用低内存表示群体非常大的数据集

Using Low-Memory Representations to Cluster Very Large Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅