首页> 美国政府科技报告 >Incremental Model-Based Clustering for Large Datasets With Small Clusters

【24h】

Incremental Model-Based Clustering for Large Datasets With Small Clusters

机译：基于增量模型的聚类适用于具有小集群的大型数据集

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to and small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.

著录项

作者
Fraley, C. ; Raftery, A. ; Wehrensy, R.;
展开▼
作者单位

展开▼
年度 2003
页码 1-24
总页数 24
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Mathematical models; Algorithms; Data processing; Set theory; Clustering; Simulation; Data management; Images; Bayes theorem; Iterations;

机译：数学模型;算法;数据处理;集合论;聚类;仿真;数据管理;图像;贝叶斯定理;迭代;

相似文献

外文文献
中文文献
专利

1. Incremental Model-Based Clustering for Large Datasets With Small Clusters [J] . Chris Fraley, Adrian Raftery, Ron Wehrens Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2005,第3期

机译：具有小聚类的大型数据集基于模型的增量聚类
2. Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets [J] . Krejci Adam, Hupp Ted R., Lexa Matej, Bioinformatics . 2016,第1期

机译：Hammock：一种基于隐马尔可夫模型的肽聚类算法，用于识别大型数据集中的蛋白质相互作用共有基序
3. Model-based clustering for image segmentation and large datasets via sampling [J] . Wehrens R, Buydens LMC, Fraley C, Journal of classification . 2004,第2期

机译：通过采样对图像分割和大型数据集进行基于模型的聚类
4. Enhance Incremental Clustering for Time Series Datasets Using Distance Measures [C] . Sneha Khobragade, Preeti Mulay International Conference on Intelligent Computing and Applications . 2018

机译：使用距离测量增强时间序列数据集的增量聚类
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets [O] . Adam Krejci, Ted R. Hupp, Matej Lexa, -1

机译：Hammock：一种基于隐马尔可夫模型的肽聚类算法用于识别大型数据集中的蛋白质相互作用共有基序
7. Incremental Model-Based Clustering for Large Datasets with Small Clusters [O] . Fraley C., Raftery A.E., Wehrens H.R.M.J. 2005

机译：具有小集群的大型数据集基于模型的增量集群

Incremental Model-Based Clustering for Large Datasets With Small Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅