Scalable Algorithm for Clustering Sequential Data.

机译：可扩展的序列数据聚类算法。

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In recent years, we have seen an enormous growth in the amount of available commercial and scientific data. Data from domains such as protein sequences, retail transactions, intrusion detection, and Web-logs have an inherent sequential nature. Clustering of such data sets is useful for various purposes. For example, clustering of sequences from commercial data sets may help marketer identify different customer groups based upon their purchasing patterns. Grouping protein sequences that share similar structure helps in identifying sequences with similar functionality. Over the years, many methods have been developed for clustering objects according to their similarity. However these methods tend to have a computational complexity that is at least quadratic on the number of sequences. In this paper we present an entirely different approach to sequence clustering that does not require an all-against-all analysis and uses a near-linear complexity K-means based clustering algorithm. Our experiments using data sets derived from sequences of purchasing transactions and protein sequences show that this approach is scalable and leads to reasonably good clusters.

著录项

作者
Guralnik, V.; Karypis, G.;
展开▼
作者单位

展开▼
年度 2001
页码 1-15
总页数 15
原文格式 PDF
正文语种
中图分类
关键词
Clustering; Sequences; Data mining; Proteins; Algorithms; Digital data; Feature selection; Molecular biophysics; Computational complexity theory; Fox proteins; Protein sequences; Scalable algorithms; Sequential data clustering; Dataset clustering; Computational complex;

机译：聚类;序列;数据挖掘;蛋白质;算法;数字数据;特征选择;分子生物物理学;计算复杂性理论; Fox蛋白质;蛋白质序列;可扩展算法;顺序数据聚类;数据集聚类;计算复合体;

相似文献

外文文献
中文文献
专利

1. Detecting Variability in Massive Astronomical Time-series Data. III. Variable Candidates in the SuperWASP DR1 Found by Multiple Clustering Algorithms and a Consensus Clustering Method [J] . Min-Su Shin, Seo-Won Chang, Hahn Yi, The astronomical journal . 2018,第5期

机译：在海量天文时间序列数据中检测变异性。三，通过多种聚类算法和共识聚类方法在SuperWASP DR1中找到可变候选
2. Structures and Properties of Core-Shell B@Mn-8@Mg-10 Cluster and Brief of Efficient Core-Shell Cluster Algorithm and Global Minima Algorithm Based on a Sequential Microdisplacement Press-Expand Method [J] . Fang Bai, Ge Gui-Xian, Wang Bao-Lin, The journal of physical chemistry, C. Nanomaterials and interfaces . 2019,第17期

机译：核心壳B @ MN-8 @ MG-10集群的结构与性能与高效核心 - 壳牌算法简介基于顺序微剖面压力膨胀方法的全局最小算法
3. Algorithms for Sequential Extraction of Clusters by Possibilistic Method and Comparison with Mountain Clustering [J] . Sadaaki Miyamoto, Youhei Kuroda, Kenta Arai Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2008,第5期

机译：用可能方法顺序提取聚类算法及其与山聚类的比较。
4. A scalable algorithm for clustering sequential data [C] . Guralnik, V., Karypis, . 2001

机译：用于对顺序数据进行聚类的可伸缩算法
5. A sequential clustering algorithm with applications to gene expression data. [D] . Song, Jongwoo. 2003

机译：一种顺序聚类算法，适用于基因表达数据。
6. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale [O] . Scott Emmons, Stephen Kobourov, Mike Gallant, -1

机译：大规模网络聚类算法和聚类质量指标分析
7. Detecting Variability in Massive Astronomical Time-series Data. III. Variable Candidates in the SuperWASP DR1 Found by Multiple Clustering Algorithms and a Consensus Clustering Method [O] . Min-Su Shin, Seo-Won Chang, Hahn Yi, 2018

机译：检测大规模天文时间序列数据中的可变性。 III。由多个聚类算法找到的SuperWasp DR1中的可变候选者和共识聚类方法

Scalable Algorithm for Clustering Sequential Data.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅