首页> 外文会议>IEEE International Conference on Data Mining >TopicOcean: An Ever-Increasing Topic Model with Meta-Learning
【24h】

TopicOcean: An Ever-Increasing Topic Model with Meta-Learning

机译:主题:与元学习的一个不断增加的主题模型

获取原文

摘要

Topic modeling has been intensively studied and widely applied in both academia and industry in the last decade. In the literature, topic models usually need to be trained from scratch for each individual corpus. Hence, the wisdom of the crowd (i.e., topic models previously trained based upon other corpora) is abandoned. Since a massive amount of in-domain data, considerable computational cost, and human labour are involved in obtaining a high-quality topic model, training from scratch for each new corpus is a huge waste of resources. In this paper, we propose the novel TopicOcean framework, which aims to integrate well-trained topic models and transfer the knowledge of accumulated topics to new corpora in order to improve the quality of their topic models. We first propose a method of constructing the ever-increasing TopicOcean, and then propose a meta-learning mechanism that transfers the meta-level knowledge (i.e., topics) in TopicOcean to the scenario of topic modeling on new corpora. Comprehensive experiments validate that the TopicOcean framework can significantly outperform the state-of-the-art (53.77% perplexity improvement on a temporal-shift corpus and 29.24% improvement on a domain-shift corpus). The well-trained high-quality topic models used to construct TopicOcean have been opensourced to promote further research.
机译:在过去十年中,在学术界和工业中,研究了主题建模并广泛应用于众所周知。在文献中,主题模型通常需要从头开始培训每个语料库。因此,人群的智慧(即,基于其他公司培训的主题模型)被遗弃。由于大量的域名数据,相当大的计算成本和人工劳动力参与获得高质量的主题模型,从头开始训练,每个新的语料库都是巨大的资源浪费。在本文中,我们提出了新颖的主题框架,旨在将训练有素的主题模型集成并将积累主题的知识转移到新的语料库中,以提高其主题模型的质量。我们首先提出了一种构建不断增加的故障的方法,然后提出了一个元学习机制,将主题中的Meta级知识(即主题)转移到新的Corpora上的主题的情景。综合实验验证,主题框架可以显着优于最先进的(对颞圆常数语料库的困惑改善53.77%的困惑改善,在域移语中提高了29.24%)。训练有素的高质量主题模型,用于构建故障面临的是促进进一步的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号