首页> 外文会议>2011 IEEE International Workshop on Machine Learning for Signal Processing >Bayesian nonparametric modeling of hierarchical topics and sentences
【24h】

Bayesian nonparametric modeling of hierarchical topics and sentences

机译:分层主题和句子的贝叶斯非参数建模

获取原文

摘要

Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.
机译:自动对多个文档的句子打分对文档摘要起着重要作用。这项研究提出了一种新的贝叶斯非参数方法来进行分层主题和句子模型(HTSM)的无监督学习。该HTSM在嵌套中国餐厅流程(nCRP)中发现了扩展的层次结构,其中每个句子都由层次结构的主题路径分配。建立了具有从广泛主题到精确主题的分布的树结构。描述了句子之间的依存关系。不同句子中的单词由共享的分层Dirichlet流程(HDP)表示。分别根据基于HDP和nCRP的无监督非参数过程来估计单词级别和句子级别的主题混合。与代表基于单个路径的文档的nCRP相比,建议的HTSM与新的nCRP相比更为灵活,其中合并了多个路径以生成文档的不同句子。开发了一个摘要系统以从文档中提取语义丰富的句子。开发了一种新的吉布斯采样算法来推断HTSM的结构参数。在DUC语料库的实验中,所提出的HTSM在ROUGE度量方面优于其他文档摘要方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号