【24h】

Spherical Paragraph Model

机译:球面模型

获取原文

摘要

Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by distributed models, making it possible to build text representations based on a better foundation called the Bag-of-Word-Embedding (BoWE) representation. However, existing text representation methods using BoWE often lack sound probabilistic foundations or cannot well capture the semantic related-ness encoded in word vectors. To address these problems, we introduce the Spherical Paragraph Model (SPM), a probabilistic generative model based on BoWE, for text representation. SPM has good probabilistic interpretability and can fully leverage the rich semantics of words, the word co-occurrence information as well as the corpus-wide information to help the representation learning of texts. Experimental results on topical classification and sentiment analysis demonstrate that SPM can achieve new state-of-the-art performances on several benchmark datasets.
机译:将文本表示为固定长度的向量是许多语言处理任务的核心。大多数传统方法都基于简单的单词袋(BoW)表示来构建文本表示,这丢失了单词之间丰富的语义关系。自然语言处理的最新进展表明,可以通过分布式模型有效地获取单词的语义有意义的表示形式,从而有可能基于称为词袋嵌入(BoWE)表示形式的更好基础来构建文本表示形式。但是,使用BoWE的现有文本表示方法通常缺乏可靠的概率基础,或者不能很好地捕获单词向量中编码的语义相关性。为了解决这些问题,我们引入了Spherical Paragraph Model(SPM),这是一种基于BoWE的概率生成模型,用于文本表示。 SPM具有良好的概率可解释性,并且可以充分利用单词的丰富语义,单词共现信息以及全语料库信息来帮助学习文本表示。关于主题分类和情感分析的实验结果表明,SPM可以在多个基准数据集上实现最新的最新性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号