首页> 外文会议>Twenty-First International Workshop on Database and Expert Systems Applications >Identifying Sentence-Level Semantic Content Units with Topic Models
【24h】

Identifying Sentence-Level Semantic Content Units with Topic Models

机译:使用主题模型识别句子级语义内容单元

获取原文

摘要

Statistical approaches to document content modeling typically focus either on broad topics or on discourse-level subtopics of a text. We present an analysis of the performance of probabilistic topic models on the task of learning sentence-level topics that are similar to facts. The identification of sentential content with the same meaning is an important task in multi-document summarization and the evaluation of multi-document summaries. In our approach, each sentence is represented as a distribution over topics, and each topic is a distribution over words. We compare the topic-sentence assignments discovered by a topic model to gold-standard assignments that were manually annotated on a set of closely related pairs of news articles. We observe a clear correspondence between automatically identified and annotated topics. The high accuracy of automatically discovered topic-sentence assignments suggests that topic models can be utilized to identify (sub-) sentential semantic content units.
机译:用于文档内容建模的统计方法通常着重于宽泛的主题或文本的话语级子主题。我们在学习与事实相似的句子级主题的任务上对概率主题模型的性能进行了分析。具有相同含义的句子内容的识别是多文档摘要和多文档摘要评估中的重要任务。在我们的方法中,每个句子都表示为主题上的分布,而每个主题都是词上的分布。我们将主题模型发现的主题句子分配与手动在一组密切相关的新闻文章对上进行注释的黄金标准分配进行比较。我们观察到自动识别的主题和带注释的主题之间存在清晰的对应关系。自动发现的主题句子分配的高精度表明,主题模型可用于识别(子)句子语义内容单元。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号