【24h】

Continuous Model Improvement for Language Understanding with Machine Translation

机译:用机器翻译的语言理解连续模型改进

获取原文

摘要

Scaling conversational personal assistants to a multitude of languages puts high demands on collecting and labelling data, a setting in which cross-lingual learning techniques can help to reconcile the need for well-performing natural language understanding (NLU) with a desideratum to support many languages without incurring unacceptable cost. In this paper, we show that automatically annotating unlabeled utterances using machine translation in an offline fashion and adding them to the training data can improve performance for existing NLU features for low-resource languages, where a straightforward translate-test approach as considered in existing literature would fail the latency requirements of a live environment. We demonstrate the effectiveness of our method with intrinsic and extrinsic evaluation using a real-world commercial dialog system in German. We show that 56% of the resulting automatically labeled utterances had a perfect match with ground-truth labels. Moreover, we see significant performance improvements in an extrinsic evaluation settings when manually labeled data is available in small quantities.
机译:缩放对话私人助理到多种语言提出了对收集和标记数据的高要求,其中交叉语言学习技术可以帮助协调对更良好的自然语言理解(NLU)的需求,以支持许多语言没有产生不可接受的成本。在本文中,我们展示了在离线时尚中使用机器翻译自动注释未标记的话语,并将其添加到培训数据可以提高现有NLU功能的性能,以便在现有文献中考虑的简单转化测试方法将失败实现现场环境的延迟要求。我们展示了我们在德语中使用真实世界商业对话系统的内在和外在评估的方法的有效性。我们显示56%的由此产生的备受标记的话语与地面标签完美匹配。此外,我们在手动标记的数据以少量提供时,我们会看到外在评估设置的显着性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号