首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Cross-domain and Cross-lingual Abusive Language Detection: a Hybrid Approach with Deep Learning and a Multilingual Lexicon
【24h】

Cross-domain and Cross-lingual Abusive Language Detection: a Hybrid Approach with Deep Learning and a Multilingual Lexicon

机译:跨域和跨语言的辱骂性语言检测:具有深度学习和多语言词典的混合方法

获取原文
获取外文期刊封面目录资料

摘要

The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our joint-learning model also in out-domain scenarios.
机译:在可变和多语言环境中检测社交媒体中的辱骂性语言的计算方法的开发最近获得了极大的关注。近年来,针对不同语言开发的大量基准语料库证实了人们的兴趣日益增长。但是,虐待性语言行为是多方面的,并且可用的数据集具有不同的主题重点。这使得滥用语言检测成为依赖于域的任务,而构建一个强大的系统来检测一般的滥用内容成为首要挑战。此外,大多数资源都可用于英语,这使得检测资源匮乏的语言中的辱骂语言成为进一步的挑战。我们通过考虑跨不同领域和语言的十个公开可用的数据集来应对这两个挑战。提出了一种具有深度学习和多语言词典的混合方法,用于跨域和跨语言检测滥用内容,并将其与其他更简单的模型进行比较。我们表明,在一般的辱骂语言数据集上训练系统将产生一个跨域的鲁棒性系统,该系统可用于检测其他更特定类型的辱骂性内容。我们还发现,使用与域无关的词典HurtLex可以在域和语言之间传递知识。在跨语言实验中,我们证明了联合学习模型在域外场景中的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号