Hierarchical model-based clustering of large datasets through fractionation and refractionation

机译：通过分级和分级对大型数据集进行基于层次模型的聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of clustering is to identify distinct groups in a dataset. Compared to non-parametric clustering methods like complete linkage, hierarchical model-based clustering has the advantage of offering a way to estimate the number of groups present in the data. However, its computational cost is quadratic in the number of items to be clustered, and it is therefore not applicable to large problems. We review an idea called Fractionation, originally conceived by Cutting, Karger, Pedersen and Tukey for non-parametric hierarchical clustering of large datasets, and describe an adaptation of Fractionation to model-based clustering. A further extension, called Refractionation, leads to a procedure that can be successful even in the difficult situation where there are large numbers of small groups.

机译：聚类的目的是识别数据集中的不同组。与完全链接之类的非参数聚类方法相比，基于层次模型的聚类具有提供一种估计数据中存在的组数的方式的优势。但是，其计算成本在要聚类的项目数量上是平方的，因此不适用于较大的问题。我们回顾了最初由Cuting，Karger，Pedersen和Tukey提出的，用于大型数据集的非参数层次聚类的，称为“分数”的思想，并描述了分数对基于模型的聚类的适应性。进一步的扩展称为折射，即使在有大量小团体的困难情况下，该过程也可以成功。

著录项

来源
《Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining(KDD-2000)》|2002年|P.183-190|共8页
会议地点
作者
Jeremy Tantrum; Alejandro Murua; Werner Stuetzle;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;数据处理、数据处理系统;
关键词
refractionation;

机译：验光;

相似文献

外文文献
中文文献
专利

1. Hierarchical model-based clustering of large datasets through fractionation and refractionation [J] . Jeremy Tantrum, Alejandro Murua, Werner Stuetzle Information Systems . 2004,第4期

机译：通过分级和分级对大型数据集进行基于模型的分层聚类
2. Hierarchical Model-Based Clustering for Large Datasets [J] . Christian Posse Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2001,第3期

机译：基于分层模型的大型数据集的聚类
3. Incremental Model-Based Clustering for Large Datasets With Small Clusters [J] . Chris Fraley, Adrian Raftery, Ron Wehrens Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2005,第3期

机译：具有小聚类的大型数据集基于模型的增量聚类
4. Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation [C] . Jeremy Tantrum, Alejandro Murua, Werner Stuetzle Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton . 2002

机译：通过分级和分级的基于层次模型的大型数据集聚类
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets [O] . Adam Krejci, Ted R. Hupp, Matej Lexa, -1

机译：Hammock：一种基于隐马尔可夫模型的肽聚类算法用于识别大型数据集中的蛋白质相互作用共有基序
7. Hierarchical model-based clustering of large datasets through fractionation and refractionation [O] . Jeremy Tantrum, Alejandro Murua, Werner Stuetzle 2012

机译：通过分级和分级对大型数据集进行基于层次模型的聚类
8. Incremental Model-Based Clustering for Large Datasets With Small Clusters [R] . Fraley, C. , Raftery, A. , Wehrensy, R. 2003

机译：基于增量模型的聚类适用于具有小集群的大型数据集

Hierarchical model-based clustering of large datasets through fractionation and refractionation

摘要

著录项

相似文献

相关主题

期刊订阅