A general model for clustering binary data

机译：二进制数据聚类的通用模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions contain items and for document datasets where the documents contain "bag of words". The contribution of the paper is three-fold. First a general binary data clustering model is presented. The model treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments. We characterize several variations with different optimization procedures for the general model. Second, we also establish the connections between our clustering model with other existing clustering methods. Third, we also discuss the problem for determining the number of clusters for binary clustering. Experimental results show the effectiveness ofthe proposed clustering model.

机译：聚类是通过将数据点划分为相似性类来识别大型数据集中的模式分布和内在相关性的问题。本文研究了二进制数据聚类的问题。对于交易包含项目的购物篮数据集和文档包含“单词袋”的文档数据集，情况就是如此。论文的贡献是三方面的。首先，提出了一种通用的二进制数据聚类模型。该模型基于数据和要素的对称关联关系，对它们进行同等对待，并明确描述数据分配和要素分配。我们用通用模型的不同优化程序来描述几种变体。其次，我们还建立了聚类模型与其他现有聚类方法之间的联系。第三，我们还讨论了确定二进制聚类的簇数的问题。实验结果表明了该聚类模型的有效性。

著录项

来源
《ACM SIGKDD international conference on Knowledge discovery in data mining》|2005年|P.188-197|共10页
会议地点
作者
Tao Li; PTao Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
matrix approximation;

机译：矩阵近似;

相似文献

外文文献
中文文献
专利

1. Modeling clustered binary data with excess zero clusters [J] . John Kwagyan, Victor Apprey Statistical methods in medical research . 2018,第9期

机译：使用超额零集群建模集群二进制数据
2. Clustered binary data with random cluster sizes:a dual poisson modelling approach [J] . Renjun Ma, Bent Jorgensen, Jon Douglas Willms Statistical modeling: applications in contemporary issues . 2009,第2期

机译：具有随机聚类大小的聚类二进制数据：双重泊松建模方法
3. Joint modeling of binary response and survival for clustered data in clinical trials [J] . Chen Bingshu E., Wang Jia Statistics in medicine . 2020,第3期

机译：临床试验中聚类数据的二元响应和存活的联合建模
4. Performance of Mixed Effects for Clustered Binary Data Models [C] . Intesar N. El-Saeiti ISM International Statistical Conference . 2015

机译：集群二进制数据模型的混合效果性能
5. Analysis of models for longitudinal and clustered binary data. [D] . Yang, Weiming. 2010

机译：纵向和聚类二进制数据的模型分析。
6. A modeling framework for the analysis of HPV incidence and persistence: a semi-parametric approach for clustered binary longitudinal data analysis [O] . Xiangrong Kong, Ronald H. Gray, Lawrence H. Moulton, -1

机译：HPV发病率分析的建模框架和持久性：半导体二元纵向数据分析的半参数方法
7. MULTIMORBIDITY CLUSTERS: CLUSTERING BINARY DATA FROM MULTIMORBIDITY CLUSTERS: CLUSTERING BINARY DATA FROM A LARGE ADMINISTRATIVE MEDICAL DATABASE [O] . John E. Cornell, Jacqueline A. Pugh, John W. Williams, 2015

机译：多个群集：聚集多个群集的二进制数据：聚集来自大型行政医疗数据库的二进制数据

A general model for clustering binary data

摘要

著录项

相似文献

相关主题

期刊订阅