Understanding Errors in Approximate Distributed Latent Dirichlet Allocation

Ihler A.

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Understanding Errors in Approximate Distributed Latent Dirichlet Allocation

【24h】

Understanding Errors in Approximate Distributed Latent Dirichlet Allocation

机译：了解近似分布式潜在狄利克雷分配中的错误

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Latent Dirichlet allocation (LDA) is a popular algorithm for discovering semantic structure in large collections of text or other data. Although its complexity is linear in the data size, its use on increasingly massive collections has created considerable interest in parallel implementations. “Approximate distributed” LDA, or AD-LDA, approximates the popular collapsed Gibbs sampling algorithm for LDA models while running on a distributed architecture. Although this algorithm often appears to perform well in practice, its quality is not well understood theoretically or easily assessed on new data. In this work, we theoretically justify the approximation, and modify AD-LDA to track an error bound on performance. Specifically, we upper bound the probability of making a sampling error at each step of the algorithm (compared to an exact, sequential Gibbs sampler), given the samples drawn thus far. We show empirically that our bound is sufficiently tight to give a meaningful and intuitive measure of approximation error in AD-LDA, allowing the user to track the tradeoff between accuracy and efficiency while executing in parallel.

机译：潜在狄利克雷分配（LDA）是一种流行的算法，用于发现大量文本或其他数据中的语义结构。尽管它的复杂性在数据大小上是线性的，但它在越来越庞大的集合上的使用引起了人们对并行实现的极大兴趣。当在分布式体系结构上运行时，“近似分布式” LDA或AD-LDA近似用于LDA模型的流行的折叠Gibbs采样算法。尽管此算法在实践中通常看起来表现良好，但其质量在理论上并没有得到很好的理解，也很难根据新数据进行评估。在这项工作中，我们从理论上证明了这种近似的合理性，并修改了AD-LDA以跟踪性能上的误差范围。具体来说，给定到目前为止已抽取的样本，我们将在算法的每个步骤（与精确的顺序Gibbs样本器相比）中产生抽样错误的概率上限设定为上限。我们凭经验表明，边界足够紧密，可以在AD-LDA中提供有意义且直观的近似误差度量，允许用户在并行执行时跟踪准确性和效率之间的权衡。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2012年第5期|p.952-960|共9页
作者
Ihler A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 13:39:14

相似文献

外文文献
中文文献
专利

1. A HYBRID WORD EMBEDDING MODEL BASED ON ADMIXTURE OF POISSON-GAMMA LATENT DIRICHLET ALLOCATION MODEL AND DISTRIBUTED WORD-DOCUMENT-TOPIC REPRESENTATION [J] . IBRAHIM BAKARI BALA, MOHD ZAINURI SARINGAT, AIDA MUSTAPHA Journal of Theoretical and Applied Information Technology . 2020,第9期

机译：一种基于泊松 - 伽马潜在Dirichlet分配模型和分布式字文档主题表示的混合词嵌入模型
2. Deciphering published articles on cyberterrorism: a latent Dirichlet allocation algorithm application [J] . Las Johansen Balios Caluza International journal of data mining, modelling and management . 2019,第1期

机译：解密有关网络恐怖主义的已发表文章：潜在的Dirichlet分配算法应用
3. Understanding Individualization Driving States via Latent Dirichlet Allocation Model [J] . Chen Zhijun, Zhang Yishi, Wu Chaozhong, Intelligent Transportation Systems Magazine, IEEE . 2019,第2期

机译：通过潜在狄利克雷分配模型了解个性化驱动状态
4. Distributed Latent Dirichlet Allocation for Objects-Distributed Cluster Ensemble [C] . Hongjun WANG, Zhishu LI, Yang CHENG International Conference on Natural Language Processing and Knowledge Engineering . 2008

机译：对象分布式集群集合的分布式潜在Dirichlet分配
5. Comparing latent Dirichlet allocation and latent semantic analysis as classifiers [D] . Anaya, Leticia H. 2011

机译：比较潜在Dirichlet分配和潜在语义分析作为分类器
6. Latent Dirichlet allocation model for world trade analysis [O] . Diego Kozlowski, Viktoriya Semeshenko, Andrea Molinari 2021

机译：世界贸易分析潜在的Dirichlet分配模型
7. 1 Understanding Errors in Approximate Distributed Latent Dirichlet Allocation [O] . Er Ihler Member, David Newman 2014

机译：1了解近似分布式潜在狄利克雷分配中的错误

Understanding Errors in Approximate Distributed Latent Dirichlet Allocation

摘要

著录项

相似文献

相关主题

期刊订阅