该方法被认为能较好地对文本进行浅层语义建模.该文在前人工作基础上提出了基于LDA的条件随机场(Conditional Random Field, CRF)自动文摘(LCAS)方法,研究了LDA在有监督的单文档自动文摘中的作用,提出了将LDA提取的主题(Topic)作为特征加入CRF模型中进行训练的方法,并分析研究了在不同Topic下LDA对摘要结果的影响.实验结果表明,加入LDA特征后,能够有效地提高以传统特征为输入的CRF文摘系统的质量.%In recent years, Latent Dirichlet Allocation (LDA) has been widely applied in the document clustering, the text classification, the text segmentation, and even the query based multidocument summarization without supervision. LDA is recognized for its great power in modeling a document in a semantic way. In this paper we propose a new superivised method for the extractionbased single document summarization by adding LDA of the document as new features into a CRF summarization system. We study the power of LDA and analyze its different effects by changing the number of topics. Our experiments show that, by adding LDA features, the result of traditional CRF summarization system can be impressively increased.
展开▼