Nested Hierarchical Dirichlet Process for Nonparametric Entity-Topic Analysis

机译：用于非参数实体主题分析的嵌套层次Dirichlet过程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Hierarchical Dirichlet Process (HDP) is a Bayesian non-parametric prior for grouped data, such as collections of documents, where each group is a mixture of a set of shared mixture densities, or topics, where the number of topics is not fixed, but grows with data size. The Nested Dirichlet Process (NDP) builds on the HDP to cluster the documents, but allowing them to choose only from a set of specific topic mixtures. In many applications, such a set of topic mixtures may be identified with the set of entities for the collection. However, in many applications, multiple entities are associated with documents, and often the set of entities may also not be known completely in advance. In this paper, we address this problem using a nested HDP (nHDP), where the base distribution of an outer HDP is itself an HDP. The inner HDP creates a countably infinite set of topic mixtures and associates them with entities, while the outer HDP associates documents with these entities or topic mixtures. Making use of a nested Chinese Restaurant Franchise (nCRF) representation for the nested HDP, we propose a collapsed Gibbs sampling based inference algorithm for the model. Because of couplings between two HDP levels, scaling up is naturally a challenge for the inference algorithm. We propose an inference algorithm by extending the direct sampling scheme of the HDP to two levels. In our experiments on two real world research corpora, we show that, even when large fractions of author entities are hidden, the nHDP is able to generalize significantly better than existing models. More importantly, we are able to detect missing authors at a reasonable level of accuracy.

机译：分层Dirichlet流程（HDP）是用于分组数据（例如文档集合）的贝叶斯非参数先验，其中每个组是一组共享的混合密度或主题的混合，其中主题的数量不固定，但随着数据大小的增长而增长。嵌套Dirichlet流程（NDP）建立在HDP上以对文档进行聚类，但仅允许它们从一组特定的主题组合中进行选择。在许多应用中，这样的一组主题混合物可以与用于收集的一组实体一起识别。然而，在许多应用中，多个实体与文档相关联，并且通常也可能不完全预先知道实体的集合。在本文中，我们使用嵌套HDP（nHDP）解决此问题，其中外部HDP的基本分布本身就是HDP。内部HDP创建无限多个主题混合并将其与实体相关联，而外部HDP将文档与这些实体或主题混合相关联。利用嵌套的HDP的嵌套中国餐厅特许经营（nCRF）表示，我们为模型提出了一种基于折叠Gibbs采样的推理算法。由于两个HDP级别之间的耦合，因此对于推理算法而言，按比例放大自然是一个挑战。通过将HDP的直接采样方案扩展到两个级别，我们提出了一种推理算法。在我们对两个现实世界研究语料库的实验中，我们表明，即使隐藏了很大一部分作者实体，nHDP的泛化能力也明显优于现有模型。更重要的是，我们能够以合理的准确度检测出失踪的作者。

著录项

来源
《European conference on machine learning and knowledge discovery in databases》|2013年|564-579|共16页
会议地点
作者
Priyanka Agrawal; Lavanya Sita Tekumalla; Indrajit Bhattacharya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Hierarchical topic modeling with nested hierarchical Dirichlet process [J] . Yi-qun DING, Shan-ping LI, Zhen ZHANG, Journal of Zhejiang University. Science, A . 2009,第6期

机译：使用嵌套分层Dirichlet进程建模的分层主题
2. Hierarchical topic modeling with nested hierarchical Dirichlet process [J] . Yi-qun DING, Shan-ping LI, Zhen ZHANG, 浙江大学学报（英文版）（A辑：应用物理和工程） . 2009,第006期

机译：嵌套层次Dirichlet过程的层次主题建模
3. Nested Hierarchical Dirichlet Processes [J] . Paisley J., Wang C., Blei D.M., Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2015,第2期

机译：嵌套层次Dirichlet流程
4. Nested Hierarchical Dirichlet Process for Nonparametric Entity-Topic Analysis [C] . Priyanka Agrawal, Lavanya Sita Tekumalla, Indrajit Bhattacharya European Conference on Machine Learning and Knowledge Discovery in Databases . 2013

机译：非参数实体 - 主题分析的嵌套分层Dirichlet进程
5. Dirichlet Process Mixture Models for Nested Categorical Data. [D] . Hu, Jingchen. 2015

机译：嵌套分类数据的Dirichlet过程混合模型。
6. Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures [O] . Sarah Filippi, Chris C. Holmes, Luis E. Nieto-Barajas -1

机译：可扩展的贝叶斯非参数量度用于通过狄利克雷混合过程探索成对依赖性
7. Nested Hierarchical Dirichlet Process for Nonparametric Entity-Topic Analysis [O] . Priyanka Agrawal, Lavanya Sita Tekumalla, Indrajit Bhattacharya 2014

机译：用于非参数实体主题分析的嵌套层次Dirichlet过程
8. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. [R] . Antoniak, C. E. 1972

机译：Dirichlet过程的混合与贝叶斯非参数问题的应用。

Nested Hierarchical Dirichlet Process for Nonparametric Entity-Topic Analysis

摘要

著录项

相似文献

相关主题

期刊订阅