When is Multi-task Learning Beneficial for Low-Resource Noisy Code-switched User-generated Algerian Texts?

机译：多任务学习何时对低资源嘈杂的代码转换用户生成的阿尔及利亚文本有利？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate when is it beneficial to simultaneously learn representations for several tasks, in low-resource settings. For this, we work with noisy user-generated texts in Algerian, a low-resource non-standardised Arabic variety. That is, to mitigate the problem of the data scarcity, we experiment with jointly learning progressively 4 tasks, namely code-switch detection, named entity recognition, spell normalisation and correction, and identifying users' sentiments. The selection of these tasks is motivated by the lack of labelled data for automatic morpho-syntactic or semantic sequence-tagging tasks for Algerian, in contrast to the case of much multi-task learning for NLP. Our empirical results show that multi-task learning is beneficial for some tasks in particular settings, and that the effect of each task on another, the order of the tasks, and the size of the training data of the task with more data do matter. Moreover, the data augmentation that we performed with no external resources has been shown to be beneficial for certain tasks.

机译：我们研究在资源不足的情况下同时学习多个任务的表示形式何时有益。为此，我们使用低资源的非标准化阿拉伯语版本Algerian处理嘈杂的用户生成的文本。也就是说，为了减轻数据短缺的问题，我们尝试逐步学习共同学习4个任务，即代码切换检测，命名实体识别，拼写规范化和更正以及识别用户的情感。与针对NLP进行多任务学习的情况相反，这些任务的选择是由于缺少用于阿尔及利亚语的自动形态语法或语义序列标记任务的标记数据所致。我们的经验结果表明，多任务学习对于特定设置中的某些任务是有益的，并且每个任务对另一个任务的影响，任务的顺序以及具有更多数据的任务的训练数据的大小确实很重要。而且，已经证明我们在没有外部资源的情况下执行的数据增强对某些任务是有益的。

著录项

来源
《Workshop on Computational Approaches to Code Switching》|2020年|17-25|共9页
会议地点
作者
Wafia Adouane; Jean-Philippe Bernardy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algerian Arabic; code-switched user-generated data; multi-task learning; low-resource colloquial languages;

机译：阿尔及利亚阿拉伯语;代码转换的用户生成数据;多任务学习;低资源口语;
入库时间 2022-08-26 13:54:23

相似文献

外文文献
中文文献
专利

1. Improving named entity recognition in noisy user-generated text with local distance neighbor feature [J] . Neurocomputing . 2020,第Mara21期

机译：使用本地距离邻居功能改善嘈杂的用户生成文本中的命名实体识别
2. Low-resource neural character-based noisy text normalization [J] . Mager Manuel, Jasso Rosales Monica, Cetinoglu Ozlem, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第5期

机译：基于低资源神经字符的嘈杂文本规范化
3. M-SQL: Multi-Task Representation Learning for Single-Table Text2sql Generation [J] . Zhang Xiaoyu, Yin Fengjing, Ma Guojie, Quality Control, Transactions . 2020,第期

机译：M-SQL：单表Text2SQL生成的多任务表示学习
4. TEST POSITIVE at W-NUT 2020 Shared Task-3: Joint Event Multi-task Learning for Slot Filling in Noisy Text [C] . Chacha Chen, Chieh-Yang Huang, Yaqi Hou, Workshop on noisy user-generated text . 2020

机译：在W-Nut 2020共享任务-3中的测试阳性：联合事件多任务学习用于噪声中的插槽
5. Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling [D] . Winata, Genta Indra. 2021

机译：代码交换语言和语音神经建模的多语言转移学习
6. Multi-task learning with a natural metric for quantitative structure activity relationship learning [O] . Noureddin Sadawi, Ivan Olier, Joaquin Vanschoren, 2019

机译：具有自然指标的多任务学习用于定量结构活动关系学习
7. Normalising Non-standardised Orthography in Algerian Code-switched User-generated Data [O] . Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik 2019

机译：在阿尔及利亚代码切换用户生成的数据中归一化非标准化的正射压法

When is Multi-task Learning Beneficial for Low-Resource Noisy Code-switched User-generated Algerian Texts?

摘要

著录项

相似文献

相关主题

期刊订阅