首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
【24h】

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

机译:Saberlda:稀疏感知GPU上主题模型的学习

获取原文
获取原文并翻译 | 示例

摘要

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images, which are required to model datasets and a large number of topics, e.g., tens of thousands of topics for industry scale applications. Although distributed CPU systems have been used to address this problem, they are slow and resource inefficient. GPU-based systems have emerged as a promising alternative because of their high computational power and memory bandwidth. However, existing GPU-based LDA systems can only learn thousands of topics, because they use dense data structures, and have linear time complexity to the number of topics. In this article, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a warp-based sampling kernel, an efficient sparse matrix counting method, and a fine-grained load balancing strategy. SaberLDA achieves linear speedup on 4 GPUs and is 6 & x2013;10 times faster than existing GPU systems in thousands of topics. It can learn 40,000 topics from a dataset of billions of tokens in two hours, which was previously only achievable using clusters of tens of CPU servers.
机译:潜在的Dirichlet分配(LDA)是用于分析离散计数数据(如文本和图像)的流行工具,这些数据是建模数据集和大量主题,例如,为行业规模应用的数千次主题。虽然分布式CPU系统已被用于解决这个问题,但它们是缓慢和资源效率的效率。由于其高计算能力和内存带宽,基于GPU的系统被出现为有前途的替代方案。但是,基于GPU的LDA系统只能学习数千个主题,因为它们使用密集的数据结构,并且对主题的数量具有线性时间复杂性。在本文中,我们提出了一个基于GPU的LDA系统的Saberlda,它实现了一种稀疏感知算法,以实现逐步时间复杂性以学习大量主题。为解决稀疏性引入的挑战,我们提出了一种新颖的数据布局,基于翘曲的采样内核,有效的稀疏矩阵计数方法和细粒度负载平衡策略。 Saberlda在4个GPU上实现了线性加速,并且是6&x2013;比现有GPU系统速度快10倍,以数千个主题。它可以在两个小时内从数十亿令牌的数据集中学习40,000个主题,这是以前只能使用数十名CPU服务器的集群实现的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号