首页> 外文OA文献 >A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
【2h】

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

机译:学习跨语言词汇的强大基础   句子对齐

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

While cross-lingual word embeddings have been studied extensively in recentyears, the qualitative differences between the different algorithms remainvague. We observe that whether or not an algorithm uses a particular featureset (sentence IDs) accounts for a significant performance gap among thesealgorithms. This feature set is also used by traditional alignment algorithms,such as IBM Model-1, which demonstrate similar performance to state-of-the-artembedding algorithms on a variety of benchmarks. Overall, we observe thatdifferent algorithmic approaches for utilizing the sentence ID feature spaceresult in similar performance. This paper draws both empirical and theoreticalparallels between the embedding and alignment literature, and suggests thatadding additional sources of information, which go beyond the traditionalsignal of bilingual sentence-aligned corpora, may substantially improvecross-lingual word embeddings, and that future baselines should at least takesuch features into account.
机译:尽管近年来对跨语言单词嵌入进行了广泛的研究,但是不同算法之间的质量差异仍然不明确。我们观察到算法是否使用特定功能集(句子ID)说明了这些算法之间的显着性能差距。传统对齐算法(例如IBM Model-1)也使用此功能集,该算法在各种基准上表现出与最新技术水平相似的性能。总的来说,我们观察到利用句子ID特征空间的不同算法方法会产生相似的性能。本文在嵌入和对齐文献之间得出了经验和理论上的相似之处,并建议添加更多信息来源,这超出了双语句子对齐语料库的传统信号范围,可能会大大改善跨语言单词的嵌入,并且未来的基线至少应该采用这种方式。功能考虑在内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号