首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Cross-Lingual Word Embeddings for Low-Resource Language Modeling
【24h】

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

机译:用于低资源语言建模的交叉语言词嵌入

获取原文

摘要

Most languages have no established writ ing system and minimal written records. However, textual data is essential for nat ural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fun damental task of documentary linguistics. We investigate the use of such lexicons to improve language models when tex tual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolin gual language models. Results across a number of languages show that language models are improved by this pre-training. Application to Yongning Na, a threatened language, highlights challenges in deploy ing the approach in real low-resource en vironments.
机译:大多数语言没有建立令人作品和最小的书面记录。但是,文本数据对于NAT Ulal语言处理至关重要,对培训语言模型来支持语音识别特别重要。即使在缺少文本数据的情况下,也有一些语言可以使用双语词汇,因为创建词汇是纪录语言学的有趣状态任务。我们调查使用此类词典来提高语言模型,当Tex Tual Tual Training数据仅限于千言万语。该方法涉及将跨语言单词嵌入的跨越词嵌入作为训练单林种语言模型的初步步骤。结果涉及许多语言表明这种预训练的语言模型得到了改善。在威胁NA申请威胁的语言,突出了部署了实际低资源en环境中的方法的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号