首页> 外文期刊>Computer architecture news >SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
【24h】

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

机译:SaberLDA:GPU上的主题模型的稀疏感知学习

获取原文
获取原文并翻译 | 示例

摘要

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.
机译:潜在狄利克雷分配(LDA)是一种用于分析离散计数数据(例如文本和图像)的流行工具。应用程序需要LDA来处理大型数据集和大量主题。尽管已经使用了分布式CPU系统,但基于GPU的系统由于GPU的高计算能力和内存带宽而成为有前途的替代方案。但是,现有的基于GPU的LDA系统无法支持大量主题,因为它们在时间和空间复杂度与主题数量成线性关系的密集数据结构上使用算法。在本文中,我们提出了SaberLDA,这是一个基于GPU的LDA系统,该系统实现了稀疏感知算法以实现亚线性时间复杂度,并且可以很好地扩展以学习大量主题。为了解决稀疏性带来的挑战,我们提出了一种新颖的数据布局,一种新的基于扭曲的采样内核以及一种有效的稀疏计数矩阵更新算法,该算法可提高局部性,有效利用GPU扭曲并减少内存消耗。实验表明,SaberLDA可以从数十亿个令牌规模的数据中学习多达10,000个主题,这比以前的基于GPU的系统要大两个数量级。借助单个GPU卡,SaberLDA能够在数小时内从数十亿个令牌的数据集中学习10,000个主题,而这只有以前拥有数十台机器的集群才能实现。

著录项

  • 来源
    《Computer architecture news》 |2017年第1期|497-509|共13页
  • 作者单位

    Department of Computer Science and Technology, CBICR Center, Tsinghua University, China;

    Department of Computer Science and Technology, CBICR Center, Tsinghua University, China,State Key Lab for Intelligent Technology and Systems, TNList Lab, China;

    Department of Computer Science and Technology, CBICR Center, Tsinghua University, China,Research Institute of Tsinghua University in Shenzhen, Shenzhen, China;

    Department of Computer Science and Technology, CBICR Center, Tsinghua University, China,State Key Lab for Intelligent Technology and Systems, TNList Lab, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号