【24h】

Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog

机译:多语言任务面向对话的交叉传输学习

获取原文

摘要

One of the first steps in the utterance interpretation pipeline of many task-oriented conversational AI systems is to identify user intents and the corresponding slots. Since data collection for machine learning models for this task is time-consuming, it is desirable to make use of existing data in a high-resource language to train models in low-resource languages. However, development of such models has largely been hindered by the lack of multilingual training data. In this paper, we present a new data set of 57k annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the domains weather, alarm, and reminder. We use this data set to evaluate three different cross-lingual transfer methods: (1) translating the training data, (2) using cross-lingual pre-trained embeddings, and (3) a novel method of using a multilingual machine translation encoder as contextual word representations. We find that given several hundred training examples in the the target language, the latter two methods outperform translating the training data. Further, in very low-resource settings, multilingual contextual word representations give better results than using cross-lingual static embeddings. We also compare the cross-lingual methods to using monolingual resources in the form of contextual ELMo representations and find that given just small amounts of target language data, this method outperforms all cross-lingual methods, which highlights the need for more sophisticated cross-lingual methods.
机译:许多面向任务的会话AI系统的话语解释流水线中的第一步之一是识别用户意图和相应的槽。由于该任务的机器学习模型的数据收集是耗时的,因此希望利用高资源语言利用现有数据来培训低资源语言的模型。然而,这些模型的发展在很大程度上受到了多语言训练数据的阻碍。在本文中,我们展示了一个新的57K引号话语中的英语(43k),西班牙语(8.6k)和泰国(5k)的新数据集天气,警报和提醒。我们使用此数据集来评估三种不同的交叉传输方法:(1)使用跨语明预训练嵌入式转换训练数据(2),以及(3)使用多语言机器翻译编码器的新方法上下文字墨。我们发现,在目标语言中给出了几百次训练示例,后两种方法优于翻译训练数据。此外,在非常低的资源设置中,多语言上下文字表示提供比使用跨语言静态嵌入的更好结果。我们还比较了以上下文ELMO表示形式使用单晶体资源的跨语言方法,并发现仅提供少量目标语言数据,此方法优于所有跨语言方法,这突出了对更复杂的交叉语言的需要方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号