Improving Multilingual Models with Language-Clustered Vocabularies

机译：用语言聚类词汇改进多语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. In this work, we introduce a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocabularies. Our experiments show improvements across languages on key multilingual benchmark tasks TYDI QA (+2.9 F1), XNLI (+2.1%), and WikiAnn NER (+2.8 Fl) and factor of 8 reduction in out-of-vocabulary rate, all without increasing the size of the model or data.

机译：最先进的多语言模型依赖于涵盖模型预期在推理时间的所有语言的词汇表，但生成这些词汇的标准方法并不适用于大量多语言应用。在这项工作中，我们为多语言词汇表介绍了一种组合多种自动派生语言集群的单语语言词汇的过程，从而平衡了交叉子字共享和语言特定词汇表之间的权衡。我们的实验表明，在关键的多语言基准任务TYDI QA（+2.9 F1），XNLI（+ 2.1％）和Wikiann ner（+2.8FL）和失控率的因数不增加的因素模型或数据的大小。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|4536-4546|共11页
会议地点
作者
Hyung Won Chung; Dan Garrette; Kiat Chuan Tan; Jason Riesa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:54:32

相似文献

外文文献
中文文献
专利

1. Modelling vocabulary development among multilingual children prior to and following the transition to school entry [J] . MacLeod Andrea A. N., Castellanos-Ryan Natalie, Parent Sophie, International Journal of Bilingual Education and Bilingualism . 2019,第3a4期

机译：模拟入学之前和之后多语言儿童的词汇发展
2. Modelling vocabulary development among multilingual children prior to and following the transition to school entry [J] . MacLeod Andrea A. N., Castellanos-Ryan Natalie, Parent Sophie, International Journal of Bilingual Education and Bilingualism . 2019,第3a4期

机译：在向学校入学后和遵循学校入学之前和之后的多语种儿童中的词汇发展
3. Multilingual phone models for vocabulary-independent speech recognition tasks [J] . Joachim Kohler Speech Communication . 2001,第1a2期

机译：用于与词汇无关的语音识别任务的多语言电话模型
4. Improving Pre-Trained Multilingual Models with Vocabulary Expansion [C] . Hai Wang, Dian Yu, Kai Sun, Conference on computational natural language learning . 2019

机译：通过词汇扩展改进预训练的多语言模型
5. Investigating Sequential Vocabulary Learning Strategies as a Means of Improving L2 Vocabulary Acquisition [D] . Alharbi, Adel Marzouq. 2019

机译：调查顺序词汇学习策略作为改善L2词汇习得的手段
6. Modelling vocabulary development among multilingual children prior to and following the transition to school entry [O] . Dr. Andrea A. N. MacLeod, Dr. Natalie Castellanos-Ryan, Dr. Sophie Parent, -1

机译：模拟入学之前和之后多语言儿童的词汇发展
7. Improving Pre-Trained Multilingual Model with Vocabulary Expansion [O] . Hai Wang, Dian Yu, Kai Sun, 2019

机译：用词汇扩张改进预训练的多语言模型

Improving Multilingual Models with Language-Clustered Vocabularies

摘要

著录项

相似文献

相关主题

期刊订阅