首页> 外文会议>2011 IEEE International Workshop on Machine Learning for Signal Processing >Non-parametric co-clustering of large scale sparse bipartite networks on the GPU
【24h】

Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

机译:GPU上的大规模稀疏二分网络的非参数共聚

获取原文

摘要

Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage memory consumption on the GPU we exploit the structure of the posterior likelihood to obtain a decomposition that easily allows model estimation of the co-clustering problem on arbitrary large networks as well as distributed estimation on multiple GPUs. Finally we evaluate the implementation on real-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant for interpretation of the underlaying processes in such large scale networks.
机译:共聚是具有理论和实践重要性的问题,例如,市场篮子分析和协作过滤,以及在网络规模的文本处理中。我们用非参数生成模型来陈述共聚问题,该模型可以解决从无数个聚类的假设空间估计行和列聚类数的问题。为了达到大规模的联合集群应用程序,我们利用参数推理进行联合集群非常适合于并行计算。我们开发了一种通用GPU框架,可在大规模稀疏二分网络上进行有效推理,与基于传统CPU的估计相比,可实现两个数量级的加速。在可扩展性方面,我们发现对于拥有1亿多个链接的网络,单个GPU可以在不到一个小时的时间内实现可靠的推理。为了有效地管理GPU上的内存消耗,我们利用后似然的结构来获得分解,该分解可以轻松地对任意大型网络上的共聚问题进行模型估计,并可以对多个GPU进行分布式估计。最后,我们评估了现实生活中的大规模协同过滤数据和Web规模文本语料库的实现,证明了由无限关系模型(IRM)提出的共聚问题提取的潜在中尺度结构在具有不同初始化的连续运行中是一致的并且也与此类大型网络中底层过程的解释有关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号