首页> 美国卫生研究院文献>other >CRD: Fast Co-clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition
【2h】

CRD: Fast Co-clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition

机译:CRD:利用基于采样的矩阵分解对大型数据集进行快速联合聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The problem of simultaneously clustering columns and rows (co-clustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, co-clustering algorithms have been shown to be more effective in discovering hidden clustering structures in the data matrix. The complexity of previous co-clustering algorithms is usually O(m × n), where m and n are the numbers of rows and columns in the data matrix respectively. This limits their applicability to data matrices involving a large number of columns and rows. Moreover, some huge datasets can not be entirely held in main memory during co-clustering which violates the assumption made by the previous algorithms. In this paper, we propose a general framework for fast co-clustering large datasets, CRD. By utilizing recently developed sampling-based matrix decomposition methods, CRD achieves an execution time linear in m and n. Also, CRD does not require the whole data matrix be in the main memory. We conducted extensive experiments on both real and synthetic data. Compared with previous co-clustering algorithms, CRD achieves competitive accuracy but with much less computational cost.
机译:同时将列和行聚簇(共聚簇)的问题出现在重要的应用程序中,例如文本数据挖掘,微阵列分析和推荐系统分析。与传统的聚类算法相比,共聚算法在发现数据矩阵中的隐藏聚类结构方面更有效。以前的共聚算法的复杂度通常为O(m×n),其中m和n分别是数据矩阵中的行数和列数。这限制了它们对包含大量列和行的数据矩阵的适用性。而且,在联合聚簇期间,一些巨大的数据集不能完全保存在主存储器中,这违反了先前算法的假设。在本文中,我们提出了用于快速共同聚类大型数据集的通用框架CRD。通过利用最近开发的基于采样的矩阵分解方法,CRD实现了以m和n为线性的执行时间。同样,CRD不需要整个数据矩阵都位于主存储器中。我们对真实和综合数据进行了广泛的实验。与以前的共聚算法相比,CRD获得了竞争性的准确性,但计算成本却低得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号