SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Kaiwei Li; Jianfei Chen; Wenguang Chen; JunZhu

首页> 外文期刊>Computer architecture news >SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

【24h】

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

机译：SaberLDA：GPU上的主题模型的稀疏感知学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.

机译：潜在狄利克雷分配（LDA）是一种用于分析离散计数数据（例如文本和图像）的流行工具。应用程序需要LDA来处理大型数据集和大量主题。尽管已经使用了分布式CPU系统，但基于GPU的系统由于GPU的高计算能力和内存带宽而成为有前途的替代方案。但是，现有的基于GPU的LDA系统无法支持大量主题，因为它们在时间和空间复杂度与主题数量成线性关系的密集数据结构上使用算法。在本文中，我们提出了SaberLDA，这是一个基于GPU的LDA系统，该系统实现了稀疏感知算法以实现亚线性时间复杂度，并且可以很好地扩展以学习大量主题。为了解决稀疏性带来的挑战，我们提出了一种新颖的数据布局，一种新的基于扭曲的采样内核以及一种有效的稀疏计数矩阵更新算法，该算法可提高局部性，有效利用GPU扭曲并减少内存消耗。实验表明，SaberLDA可以从数十亿个令牌规模的数据中学习多达10,000个主题，这比以前的基于GPU的系统要大两个数量级。借助单个GPU卡，SaberLDA能够在数小时内从数十亿个令牌的数据集中学习10,000个主题，而这只有以前拥有数十台机器的集群才能实现。

著录项

来源
《Computer architecture news》 |2017年第1期|497-509|共13页
作者
Kaiwei Li; Jianfei Chen; Wenguang Chen; JunZhu;
展开▼
作者单位

Department of Computer Science and Technology, CBICR Center, Tsinghua University, China;

Department of Computer Science and Technology, CBICR Center, Tsinghua University, China,State Key Lab for Intelligent Technology and Systems, TNList Lab, China;

Department of Computer Science and Technology, CBICR Center, Tsinghua University, China,Research Institute of Tsinghua University in Shenzhen, Shenzhen, China;

Department of Computer Science and Technology, CBICR Center, Tsinghua University, China,State Key Lab for Intelligent Technology and Systems, TNList Lab, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs [J] . Li Kaiwei, Chen Jianfei, Chen Wenguang, IEEE Transactions on Parallel and Distributed Systems . 2020,第9期

机译：Saberlda：稀疏感知GPU上主题模型的学习
2. SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs [J] . Li Kaiwei, Chen Jianfei, Chen Wenguang, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2017,第4期

机译：Saberlda：稀疏感知GPU上主题模型的学习
3. Topic Subject Creation Using Unsupervised Learning for Topic Modeling [J] . Rashid Mehdiyev, Jean Nava, Karan Sodhi, Computer and information science . 2020,第3期

机译：主题主题创建使用无监督学习主题建模
4. From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering [C] . Ramnath Balasubramanyan, Bhavana Dalvi, William W. Cohen European conference on machine learning and knowledge discovery in databases . 2013

机译：从主题模型到半监督学习：偏向混合成员资格模型以利用实体聚类中的主题指示功能
5. Three Research Topics in Education: (1) Associations between Approaches to Learning and Academic Achievement; (2) a Meta- Analytic Review on Approaches to Learning and Academic Achievement; (3) Power Analysis in Meta-Analysis: A Three-Level Model [D] . Zhang, Bixi. 2021

机译：教育三大研究主题：（1）学习途径与学术成果之间的协会; （2）关于学习和学术成就的方法的元分析审查; （3）Meta分析中的功率分析：三级模型
6. Spiking neural network model of reinforcement learning in the honeybee implemented on the GPU [O] . Esin Yavuz, Pascale Maul, Thomas Nowotny 2015

机译：在GPU上实现的蜜蜂强化学习的尖刺神经网络模型
7. SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs [O] . Kaiwei Li, Jianfei Chen, Wenguang Chen, 2020

机译：sabreLDa：GpU上的主题模型的稀疏性意识学习

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅