首页> 外文会议>SIGMORPHON workshop on computational research in phonetics phonology, and morphology >SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
【24h】

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

机译:SIGMORPHON 2020共享任务0:类型多样的形态学变化

获取原文

摘要

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as In-grian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.
机译:自然语言处理(NLP)的一个广泛目标是开发一种能够处理任何自然语言的系统。但是,大多数系统是使用仅来自一种语言(例如英语)的数据开发的。 SIGMORPHON 2020共同的任务是进行形态学改写,目的是研究系统在不同类型的语言之间进行泛化的能力,其中许多语言资源不足。系统是使用来自45种语言和仅5种语言族的数据开发的,并根据来自其他45种语言和10种语言族(共13种)的数据进行了微调,并针对所有90种语言进行了评估。来自10个团队的总共22个系统(19个神经系统)已提交给任务。所有四个获胜系统都是神经系统的(两个单语言的转换器和两个基于大型多语言的RNN模型并具有门控注意)。大多数团队展示了针对低资源语言的数据幻觉和扩充,合奏以及多语言培训的效用。非神经学习者和手动设计的语法在某些语言(例如英格里,塔吉克语,他加禄语,扎马语,林加拉语)上表现出甚至具有竞争力甚至更高的性能,尤其是在数据非常有限的情况下。对于大多数系统,某些语言族(亚非,尼日尔-刚果,突厥语)相对较容易,并且平均准确率达到90%以上,而其他语言族则更具挑战性。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号