Identifying Sentence-Level Semantic Content Units with Topic Models

机译：使用主题模型识别句子级语义内容单元

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Statistical approaches to document content modeling typically focus either on broad topics or on discourse-level subtopics of a text. We present an analysis of the performance of probabilistic topic models on the task of learning sentence-level topics that are similar to facts. The identification of sentential content with the same meaning is an important task in multi-document summarization and the evaluation of multi-document summaries. In our approach, each sentence is represented as a distribution over topics, and each topic is a distribution over words. We compare the topic-sentence assignments discovered by a topic model to gold-standard assignments that were manually annotated on a set of closely related pairs of news articles. We observe a clear correspondence between automatically identified and annotated topics. The high accuracy of automatically discovered topic-sentence assignments suggests that topic models can be utilized to identify (sub-) sentential semantic content units.

机译：用于文档内容建模的统计方法通常着重于宽泛的主题或文本的话语级子主题。我们在学习与事实相似的句子级主题的任务上对概率主题模型的性能进行了分析。具有相同含义的句子内容的识别是多文档摘要和多文档摘要评估中的重要任务。在我们的方法中，每个句子都表示为主题上的分布，而每个主题都是词上的分布。我们将主题模型发现的主题句子分配与手动在一组密切相关的新闻文章对上进行注释的黄金标准分配进行比较。我们观察到自动识别的主题和带注释的主题之间存在清晰的对应关系。自动发现的主题句子分配的高精度表明，主题模型可用于识别（子）句子语义内容单元。

著录项

来源
《Twenty-First International Workshop on Database and Expert Systems Applications》|2010年|P.59-63|共5页
会议地点
作者
Hennig Leonhard; Strecker Thomas; Narr Sascha; De Luca Ernesto William; Albayrak Sahin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
latent dirichlet allocation; text summarization; topic models;

机译：潜在狄利克雷分配;文本摘要;主题模型;

相似文献

外文文献
中文文献
专利

1. Combining topic modeling and SAO semantic analysis to identify technological opportunities of emerging technologies [J] . Ma Tingting, Zhou Xiao, Liu Jia, Technological forecasting and social change . 2021,第Deca期

机译：结合主题建模与SAO语义分析，识别新兴技术的技术机遇
2. Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation [J] . Camille Guinaudeau, Guillaume Gravier, Pascale Sebillot Computer speech and language . 2012,第2期

机译：通过置信度，语义关系和语言模型插值增强词汇衔接度，以进行多媒体语音内容主题分割
3. Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model [J] . Zhang Peng, Wang Suge, Li Deyu, IEEE Transactions on Knowledge and Data Engineering . 2020,第12期

机译：组合主题建模与语义嵌入：嵌入增强主题模型
4. Identifying Sentence-Level Semantic Content Units with Topic Models [C] . Hennig Leonhard, Strecker Thomas, Narr Sascha, Workshop on Database and Expert Systems Applications . 2010

机译：用主题模型识别句子级语义内容单位
5. Semantically Enhanced Topic Modeling and Its Applications in Social Media [D] . Guo, Lifan 2013

机译：语义增强的主题建模及其在社交媒体中的应用
6. Revealing common disease mechanisms shared by tumors of different tissues of origin through semantic representation of genomic alterations and topic modeling [O] . Vicky Chen, John Paisley, Xinghua Lu 2017

机译：通过基因组改变的语义表示和主题建模揭示不同来源组织的肿瘤共有的常见疾病机制
7. A Semantic Patent Analysis Approach to Identifying Trends of Convergence Technology : Application of Topic Modeling and Cross-impact Analysis [O] . Byeongki Jeong, Jungwook Kim, Janghyeok Yoon 2016

机译：识别收敛技术趋势的语义专利分析方法：主题建模和交叉影响分析的应用

Identifying Sentence-Level Semantic Content Units with Topic Models

摘要

著录项

相似文献

相关主题

期刊订阅