首页> 外文期刊>Knowledge-Based Systems >Learning document representation via topic-enhanced LSTM model
【24h】

Learning document representation via topic-enhanced LSTM model

机译:通过主题增强的LSTM模型学习文档表示

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Document representation plays an important role in the fields of text mining, natural language processing, and information retrieval. Traditional approaches to document representation may suffer from the disregard of the correlations or order of words in a document, due to unrealistic assumption of word independence or exchangeability. Recently, long-short-term memory (LSTM) based recurrent neural networks have been shown effective in preserving local contextual sequential patterns of words in a document, but using the LSTM model alone may not be adequate to capture global topical semantics for learning document representation. In this work, we propose a new topic-enhanced LSTM model to deal with the document representation problem. We first employ an attention-based LSTM model to generate hidden representation of word sequence in a given document. Then, we introduce a latent topic modeling layer with similarity constraint on the local hidden representation, and build a tree-structured LSTM on top of the topic layer for generating semantic representation of the document. We evaluate our model in typical text mining applications, i.e., document classification, topic detection, information retrieval, and document clustering. Experimental results on real-world datasets show the benefit of our innovations over state-of-the-art baseline methods. (C) 2019 Elsevier B.V. All rights reserved.
机译:文档表示在文本挖掘,自然语言处理和信息检索领域起着重要作用。由于单词独立性或交换性的不切实际的假设,传统的文件代表方法可能会忽视文件中的相关词语或单词的顺序。最近,基于长期内存(LSTM)的经常性神经网络已被证明在保留文档中的本地语境上序模式方面有效,但仅使用LSTM模型可能不足以捕获用于学习文档表示的全局题单语义。在这项工作中,我们提出了一个新的主题增强的LSTM模型来处理文档表示问题。我们首先使用基于关注的LSTM模型来在给定文档中生成单词序列的隐藏表示。然后,我们引入了一个潜在主题建模层,其上隐藏表示的相似度约束,并在主题层的顶部构建一个树结构的LSTM,以生成文档的语义表示。我们在典型的文本挖掘应用程序中评估我们的模型,即文档分类,主题检测,信息检索和文档群集。现实世界数据集的实验结果显示了我们通过最先进的基线方法的创新的好处。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号