【24h】

A general model for clustering binary data

机译:二进制数据聚类的通用模型

获取原文

摘要

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions contain items and for document datasets where the documents contain "bag of words". The contribution of the paper is three-fold. First a general binary data clustering model is presented. The model treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments. We characterize several variations with different optimization procedures for the general model. Second, we also establish the connections between our clustering model with other existing clustering methods. Third, we also discuss the problem for determining the number of clusters for binary clustering. Experimental results show the effectiveness ofthe proposed clustering model.
机译:聚类是通过将数据点划分为相似性类来识别大型数据集中的模式分布和内在相关性的问题。本文研究了二进制数据聚类的问题。对于交易包含项目的购物篮数据集和文档包含“单词袋”的文档数据集,情况就是如此。论文的贡献是三方面的。首先,提出了一种通用的二进制数据聚类模型。该模型基于数据和要素的对称关联关系,对它们进行同等对待,并明确描述数据分配和要素分配。我们用通用模型的不同优化程序来描述几种变体。其次,我们还建立了聚类模型与其他现有聚类方法之间的联系。第三,我们还讨论了确定二进制聚类的簇数的问题。实验结果表明了该聚类模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号