Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

机译：GPU上的大规模稀疏二分网络的非参数共聚

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage memory consumption on the GPU we exploit the structure of the posterior likelihood to obtain a decomposition that easily allows model estimation of the co-clustering problem on arbitrary large networks as well as distributed estimation on multiple GPUs. Finally we evaluate the implementation on real-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant for interpretation of the underlaying processes in such large scale networks.

机译：共聚是具有理论和实践重要性的问题，例如，市场篮子分析和协作过滤，以及在网络规模的文本处理中。我们用非参数生成模型来陈述共聚问题，该模型可以解决从无数个聚类的假设空间估计行和列聚类数的问题。为了达到大规模的联合集群应用程序，我们利用参数推理进行联合集群非常适合于并行计算。我们开发了一种通用GPU框架，可在大规模稀疏二分网络上进行有效推理，与基于传统CPU的估计相比，可实现两个数量级的加速。在可扩展性方面，我们发现对于拥有1亿多个链接的网络，单个GPU可以在不到一个小时的时间内实现可靠的推理。为了有效地管理GPU上的内存消耗，我们利用后似然的结构来获得分解，该分解可以轻松地对任意大型网络上的共聚问题进行模型估计，并可以对多个GPU进行分布式估计。最后，我们评估了现实生活中的大规模协同过滤数据和Web规模文本语料库的实现，证明了由无限关系模型（IRM）提出的共聚问题提取的潜在中尺度结构在具有不同初始化的连续运行中是一致的并且也与此类大型网络中底层过程的解释有关。

著录项

来源
《2011 IEEE International Workshop on Machine Learning for Signal Processing》|2011年|p.1-6|共6页
会议地点
作者
Hansen Toke Jansen; Morup Morten; Kai Hansen Lars;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. An efficient manifold regularized sparse non-negative matrix factorization model for large-scale recommender systems on GPUs [J] . Li Hao, Li Keqin, An Jiyao, Information Sciences: An International Journal . 2019,第期

机译：GPU上大型推荐系统的高效歧管正则稀疏非负矩阵分解模型
2. Large-Scale Structured Sparsity via Parallel Fused Lasso on Multiple GPUs [J] . Lee Taehoon, Won Joong-Ho, Lim Johan, Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2017,第4期

机译：通过在多个GPU上并行熔融套索的大规模结构稀疏性
3. Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks [J] . Zhang Fuguo, Qi Shumei, Liu Qihua, Expert systems with applications . 2020,第Jula期

机译：通过二分网络中的聚类节点来减轻推荐系统的数据稀疏问题
4. Non-parametric co-clustering of large scale sparse bipartite networks on the GPU [C] . Hansen Toke Jansen, Morup Morten, Kai Hansen Lars IEEE International Workshop on Machine Learning for Signal Processing . 2011

机译：GPU上大规模稀疏双链网络的非参数共聚类
5. Leveraging Sparsity and Low Rank for Large-Scale Networks and Data Science [D] . Mardani, Morteza 2015

机译：利用稀疏性和低等级的大型网络和数据科学
6. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data [O] . Jiadong Ji, Di He, Yang Feng, -1

机译：JDINAC：使用高维稀疏组学数据的基于联合密度的非参数微分相互作用网络分析和分类
7. SPARSITY-COGNIZANT OVERLAPPING CO-CLUSTERING FOR BEHAVIOR INFERENCE IN SOCIAL NETWORKS [O] . 2016

机译：社交网络行为推理的spaRsITY-CIGsIZaNT重叠共聚集

Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

摘要

著录项

相似文献

相关主题

期刊订阅