Learning Distributed Representations for Multilingual Text Sequences

机译：学习多语言文本序列的分布式表示

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a novel approach to learning distributed representations of variable-length text sequences in multiple languages simultaneously. Unlike previous work which often derive representations of multi-word sequences as weighted sums of individual word vectors, our model learns distributed representations for phrases and sentences as a whole. Our work is similar in spirit to the recent paragraph vector approach but extends to the bilingual context so as to efficiently encode meaning-equivalent text sequences of multiple languages in the same semantic space. Our learned embeddings achieve state-of-the-art performance in the often used crosslingual document classification task (CLDC) with an accuracy of 92.7 for English to German and 91.5 for German to English. By learning text sequence representations as a whole, our model performs equally well in both classification directions in the CLDC task in which past work did not achieve.

机译：我们提出了一种新颖的方法，可以同时在多种语言中学习可变长度文本序列的分布式表示。与以前的工作不同，它们通常从多字序列的表示作为各个字向量的加权和，我们的模型会为整个短语和句子的分布式表示。我们的工作在精神上与最近的段落向量方法类似，但扩展到双语语境，以便在同一语义空间中有效地编码多种语言的意义等效文本序列。我们学识渊博的嵌入式在经常使用的Crosslingual Document Classification Task（CLDC）中实现最先进的表现，精度为92.7英语到德语和91.5英语。通过整体学习文本序列表示，我们的模型在CLDC任务中的分类方向上同样良好地执行，过去的工作没有实现。

著录项

来源
《Workshop on vector space Modeling for Natural Language Processing》|2015年||共7页
会议地点
作者
Hieu Pham; Minh-Thang Luong; Christopher D. Manning;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类线性空间理论（向量空间）;
关键词

相似文献

外文文献
中文文献
专利

1. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions [J] . Hai-Cheng Yi, Zhu-Hong You, Li Cheng, Computational and Structural Biotechnology Journal . 2020,第1期

机译：学习RNA和蛋白质序列的分布式表示及其预测LNCRNA - 蛋白质相互作用的应用
2. Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network [J] . Pan Xiaoyong, Shen Hong-Bin Neurocomputing . 2018,第AUGa30期

机译：通过卷积神经网络学习RNA序列的分布式表示形式及其在预测RNA-蛋白质结合位点中的应用
3. Learning Multilevel Distributed Representations for High-Dimensional Sequences [J] . Geoffrey Hinton, Ilya Sutskever JMLR: Workshop and Conference Proceedings . 2007,第2007期

机译：学习高维序列的多级分布式表示
4. Learning Distributed Representations for Multilingual Text Sequences [C] . Hieu Pham, Minh-Thang Luong, Christopher D. Manning 1st Workshop on vector space Modeling for Natural Language Processing 2015 . 2015

机译：学习多语言文本序列的分布式表示
5. SigSpace-Text: Parallel and distributed signature learning in text analytics [D] . Bandi, Rakesh Reddy 2016

机译：SigSpace-Text：文本分析中的并行和分布式签名学习
6. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions [O] . Hai-Cheng Yi, Zhu-Hong You, Li Cheng, 2020

机译：学习RNA和蛋白质序列的分布式表示及其在预测lncRNA-蛋白质相互作用中的应用
7. Learning Distributed Representations for Multilingual Text Sequences [O] . Hieu Pham, Minh-thang Luong, Christopher D. Manning 2016

机译：学习多语言文本序列的分布式表示

Learning Distributed Representations for Multilingual Text Sequences

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅