首页> 外文会议>Workshop on vector space Modeling for Natural Language Processing >Learning Distributed Representations for Multilingual Text Sequences
【24h】

Learning Distributed Representations for Multilingual Text Sequences

机译:学习多语言文本序列的分布式表示

获取原文

摘要

We propose a novel approach to learning distributed representations of variable-length text sequences in multiple languages simultaneously. Unlike previous work which often derive representations of multi-word sequences as weighted sums of individual word vectors, our model learns distributed representations for phrases and sentences as a whole. Our work is similar in spirit to the recent paragraph vector approach but extends to the bilingual context so as to efficiently encode meaning-equivalent text sequences of multiple languages in the same semantic space. Our learned embeddings achieve state-of-the-art performance in the often used crosslingual document classification task (CLDC) with an accuracy of 92.7 for English to German and 91.5 for German to English. By learning text sequence representations as a whole, our model performs equally well in both classification directions in the CLDC task in which past work did not achieve.
机译:我们提出了一种新颖的方法,可以同时在多种语言中学习可变长度文本序列的分布式表示。与以前的工作不同,它们通常从多字序列的表示作为各个字向量的加权和,我们的模型会为整个短语和句子的分布式表示。我们的工作在精神上与最近的段落向量方法类似,但扩展到双语语境,以便在同一语义空间中有效地编码多种语言的意义等效文本序列。我们学识渊博的嵌入式在经常使用的Crosslingual Document Classification Task(CLDC)中实现最先进的表现,精度为92.7英语到德语和91.5英语。通过整体学习文本序列表示,我们的模型在CLDC任务中的分类方向上同样良好地执行,过去的工作没有实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号