【24h】

Rethinking LDA: Why Priors Matter

机译:重新思考LDA:为何优先考虑

获取原文

摘要

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such "smoothing parameters" have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hy-perparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling.
机译:主题模型的实现通常使用具有固定浓度参数的对称Dirichlet先验,并且隐含的假设是这种“平滑参数”几乎没有实际效果。在本文中,我们探讨了主题模型的几类结构化先验。我们发现,与文档主题分布相比,不对称的Dirichlet先验具有比对称主题优先级更大的优势,而与主题词分布相比,不对称先验没有提供任何真正的好处。通过简单,有效的超参数优化步骤对这种现有结构进行逼近足以实现这些性能提升。我们主张的现有结构显着提高了主题模型的健壮性,以适应主题数量的变化以及自然语言中常见的词频分布严重偏斜的问题。由于可以使用有效的算法来实现此现有结构,而这种算法会增加超出标准推断技术的成本,因此可以忽略不计,因此我们建议将其作为主题建模的新标准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号