首页> 外文会议>Workshop on Computational Approaches to Code Switching >When is Multi-task Learning Beneficial for Low-Resource Noisy Code-switched User-generated Algerian Texts?
【24h】

When is Multi-task Learning Beneficial for Low-Resource Noisy Code-switched User-generated Algerian Texts?

机译:多任务学习何时对低资源嘈杂的代码转换用户生成的阿尔及利亚文本有利?

获取原文

摘要

We investigate when is it beneficial to simultaneously learn representations for several tasks, in low-resource settings. For this, we work with noisy user-generated texts in Algerian, a low-resource non-standardised Arabic variety. That is, to mitigate the problem of the data scarcity, we experiment with jointly learning progressively 4 tasks, namely code-switch detection, named entity recognition, spell normalisation and correction, and identifying users' sentiments. The selection of these tasks is motivated by the lack of labelled data for automatic morpho-syntactic or semantic sequence-tagging tasks for Algerian, in contrast to the case of much multi-task learning for NLP. Our empirical results show that multi-task learning is beneficial for some tasks in particular settings, and that the effect of each task on another, the order of the tasks, and the size of the training data of the task with more data do matter. Moreover, the data augmentation that we performed with no external resources has been shown to be beneficial for certain tasks.
机译:我们研究在资源不足的情况下同时学习多个任务的表示形式何时有益。为此,我们使用低资源的非标准化阿拉伯语版本Algerian处理嘈杂的用户生成的文本。也就是说,为了减轻数据短缺的问题,我们尝试逐步学习共同学习4个任务,即代码切换检测,命名实体识别,拼写规范化和更正以及识别用户的情感。与针对NLP进行多任务学习的情况相反,这些任务的选择是由于缺少用于阿尔及利亚语的自动形态语法或语义序列标记任务的标记数据所致。我们的经验结果表明,多任务学习对于特定设置中的某些任务是有益的,并且每个任务对另一个任务的影响,任务的顺序以及具有更多数据的任务的训练数据的大小确实很重要。而且,已经证明我们在没有外部资源的情况下执行的数据增强对某些任务是有益的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号