首页> 外文会议>Workshop on continuous vector space models and their compositionality 2013 >Letter N-Gram-based Input Encoding for Continuous Space Language Models
【24h】

Letter N-Gram-based Input Encoding for Continuous Space Language Models

机译:连续空间语言模型的基于字母N-Gram的输入编码

获取原文
获取原文并翻译 | 示例

摘要

We present a letter-based encoding for words in continuous space language models. We represent the words completely by letter n-grams instead of using the word index. This way, similar words will automatically have a similar representation. With this we hope to better generalize to unknown or rare words and to also capture morphological information. We show their influence in the task of machine translation using continuous space language models based on restricted Boltz-mann machines. We evaluate the translation quality as well as the training time on a German-to-English translation task of TED and university lectures as well as on the news translation task translating from English to German. Using our new approach a gain in BLEU score by up to 0.4 points can be achieved.
机译:我们为连续空间语言模型中的单词提供基于字母的编码。我们用字母n-gram完全代表单词,而不使用单词index。这样,相似的词将自动具有相似的表示。以此,我们希望更好地归纳为未知或稀有词,并捕获形态信息。我们使用基于受限Boltz-mann机器的连续空间语言模型来显示他们在机器翻译任务中的影响。我们评估TED的德语到英语翻译任务和大学讲座的翻译质量以及培训时间,以及从英语到德语翻译的新闻翻译任务。使用我们的新方法,BLEU分数最高可提高0.4分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号