首页> 外文会议>Federated Conference on Computer Science and Information Systems >Extracting semantic prototypes and factual information from a large scale corpus using variable size window topic modelling
【24h】

Extracting semantic prototypes and factual information from a large scale corpus using variable size window topic modelling

机译:使用可变大小窗口主题建模从大规模语料库中提取语义原型和事实信息

获取原文
获取外文期刊封面目录资料

摘要

In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.
机译:在本文中,提出了由语义刻板印象和事实信息的混合组成的文本事件模型。介绍了一种方法,使得能够区分一般性的语义原型,所述一般性质描述了来自特定事件特定的事实元素的一般性事件。接下来,本文介绍了从大规模语料库中的无监督主题提取的实验结果,额外的时间结构。该实验实现为基于日志熵权的潜在Dirichlet分配和矢量空间建模提供的信息的性质的比较。提出了使用不同时间窗口对主题建模结果的不同时间窗口的影响。最后,如果未经监督的主题建模可以反映更深的语义信息,例如从描述给定事件或其原因和结果的元素,以及从纯实的事实数据辨别出来的元素,则提出讨论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号