基于中心词的上下文主题模型

常东亚; 严建峰; 杨璐

首页> 中文期刊>计算机应用研究 >基于中心词的上下文主题模型

基于中心词的上下文主题模型

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

潜在狄利克雷分配(LDA)主题模型是处理非结构化文档的有效工具.但是它是建立在词袋模型(bag of word,BOW)假设上的,这种假设把每一篇文档看成是单词的组合,既不考虑文档与文档之间的顺序关系,也不考虑单词与单词之间的顺序关系.同时针对现有的模型精度不高,提出了基于中心词的上下文主题模型.这种模型的思想是一篇文档中单词的主题与其附近若干单词的主题关系更为紧密.在计算每个单词的主题分布时,以这个词为中心,前后各扩展若干个单词作为窗口,然后对每个窗口进行计算.这种方法就会形成窗口与窗口之间的顺序,从而形成单词之间也是局部有序.同时由于每个单词的上下文信息不同,所以每个单词的主题分布与其所在文档中的位置有关.通过实验表明,基于中心词的上下文主题模型在未知数据集上具有更高的精度和收敛速度.%Latent Dirichlet allocation(LDA) topic model is an effective tool to process unstructured documents.But it is built on bag-of-words(BOW) model assumption,which regard each document as a combination of the word,neither the order relationship between documentsnor the order relationship between words is concerned.To improve current model's accuracy,this paper came up with the centroid-word based context topic model,this model was based on the theory that the topic of a word in a document had strong relationship of the word which near by.When calculating the topic distribution for each word,it regared the word as the center,extend before and after several words as the window,and then performed a calculation on each window.This approach would generate the corresponding order of each window,the same as the order of words,and because of the contexts of each word were different,so the distribution of each word had relationship with the location the word in the corresponding document.Experiments show that the centroid-word based context topic model has the better accuracy and convergence rate on unknown datasets.

著录项

来源
《计算机应用研究》|2018年第4期|1005-1009|共5页
作者
常东亚; 严建峰; 杨璐;
展开▼
作者单位

苏州大学计算机科学与技术学院,江苏苏州215006;

苏州大学计算机科学与技术学院,江苏苏州215006;

苏州大学计算机科学与技术学院,江苏苏州215006;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
潜在狄利克雷分配; 主题模型; 上下文信息;
入库时间 2023-07-24 18:55:20

相似文献

中文文献
外文文献
专利

1. 基于上下文词向量和主题模型的实体消歧方法 [J] . 王瑞 ,李弼程 ,杜文倩 . 中文信息学报 . 2019,第011期
2. 基于中心词和LDA的微博热点话题发现研究 [J] . 刘干 ,林杰豪 ,翟雯熠 . 情报杂志 . 2021,第005期
3. 基于Highway-BiLSTM网络的汉语谓语中心词识别研究 [J] . 黄瑞章 ,靳文繁 ,陈艳平 . 通信学报 . 2021,第001期
4. 基于频繁依存子树模式的中心词提取方法研究 [J] . 田卫东 ,虞勇勇 . 中文信息学报 . 2016,第003期
5. 基于CRF和错误驱动的中心词识别 [J] . 田卫东 ,李亚娟 . 计算机应用研究 . 2013,第008期
6. 基于中心词驱动的术语翻译 [C] . 马丽丽 ,蔡东风 ,周蓝海 . 2009年全国模式识别学术会议暨首届中日韩模式识别学术研讨会 . 2009
7. 基于上下文的主题模型 [A] . 常东亚 . 2017

基于中心词的上下文主题模型

摘要

著录项

相似文献

相关主题

期刊订阅