首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Cross-Lingual Word Embeddings for Low-Resource Language Modeling

【24h】

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

机译：用于低资源语言建模的交叉语言词嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most languages have no established writ ing system and minimal written records. However, textual data is essential for nat ural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fun damental task of documentary linguistics. We investigate the use of such lexicons to improve language models when tex tual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolin gual language models. Results across a number of languages show that language models are improved by this pre-training. Application to Yongning Na, a threatened language, highlights challenges in deploy ing the approach in real low-resource en vironments.

机译：大多数语言没有建立令人作品和最小的书面记录。但是，文本数据对于NAT Ulal语言处理至关重要，对培训语言模型来支持语音识别特别重要。即使在缺少文本数据的情况下，也有一些语言可以使用双语词汇，因为创建词汇是纪录语言学的有趣状态任务。我们调查使用此类词典来提高语言模型，当Tex Tual Tual Training数据仅限于千言万语。该方法涉及将跨语言单词嵌入的跨越词嵌入作为训练单林种语言模型的初步步骤。结果涉及许多语言表明这种预训练的语言模型得到了改善。在威胁NA申请威胁的语言，突出了部署了实际低资源en环境中的方法的挑战。

著录项

来源
《Conference of the European Chapter of the Association for Computational Linguistics 》|2017年|xxxviii p. 643-1280|共11页
会议地点
作者
Oliver Adams; Adam Makarucha; Graham Neubig; Steven Bird; Trevor Cohn;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程 ;
关键词

相似文献

外文文献
中文文献
专利

1. Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD [J] . Faili Hesham, Taghizadeh Nasrin The Journal of Artificial Intelligence Research . 2016 ,第10期

机译：使用跨语言WSD的低资源语言自动Wordnet开发
2. Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD [J] . Taghizadeh Nasrin, Faili Hesham The Journal of Artificial Intelligence Research . 2016 ,第Null期

机译：使用跨语言WSD自动开发低资源语言的Wordnet
3. Cross-Lingual Language Modeling for Low-Resource Speech Recognition [J] . Ping Xu, Fung P. Audio, Speech, and Language Processing, IEEE Transactions on . 2013 ,第6期

机译：低资源语音识别的跨语言建模
4. Cross-Lingual Word Embeddings for Low-Resource Language Modeling [C] . Oliver Adams, Adam Makarucha, Graham Neubig, Conference of the European Chapter of the Association for Computational Linguistics . 2017

机译：低资源语言建模的跨语言单词嵌入
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Enhancing African low-resource languages: Swahili data for language modelling [O] . Casper S. Shikali, Refuoe Mokhosi 2020

机译：增强非洲低资源语言：语言建模的斯瓦希里语数据
7. Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya [O] . Awet Fesseha, Shengwu Xiong, Eshete Derb Emiru, 2021

机译：基于卷积神经网络的文本分类和低资源语言的Word嵌入：Tigrinya

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

摘要

著录项

相似文献

相关主题

期刊订阅