Cross-domain and Cross-lingual Abusive Language Detection: a Hybrid Approach with Deep Learning and a Multilingual Lexicon

机译：跨域和跨语言的辱骂性语言检测：具有深度学习和多语言词典的混合方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our joint-learning model also in out-domain scenarios.

机译：在可变和多语言环境中检测社交媒体中的辱骂性语言的计算方法的开发最近获得了极大的关注。近年来，针对不同语言开发的大量基准语料库证实了人们的兴趣日益增长。但是，虐待性语言行为是多方面的，并且可用的数据集具有不同的主题重点。这使得滥用语言检测成为依赖于域的任务，而构建一个强大的系统来检测一般的滥用内容成为首要挑战。此外，大多数资源都可用于英语，这使得检测资源匮乏的语言中的辱骂语言成为进一步的挑战。我们通过考虑跨不同领域和语言的十个公开可用的数据集来应对这两个挑战。提出了一种具有深度学习和多语言词典的混合方法，用于跨域和跨语言检测滥用内容，并将其与其他更简单的模型进行比较。我们表明，在一般的辱骂语言数据集上训练系统将产生一个跨域的鲁棒性系统，该系统可用于检测其他更特定类型的辱骂性内容。我们还发现，使用与域无关的词典HurtLex可以在域和语言之间传递知识。在跨语言实验中，我们证明了联合学习模型在域外场景中的有效性。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|363-370|共8页
会议地点
作者
Endang Wahyu Pamungkas; Viviana Patti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach [J] . Xiaoyong Pan, Hong-Bin Shen BMC Bioinformatics . 2017,第1期

机译：基于新的基于混合深度学习的跨域知识整合方法的RNA-蛋白质结合基序挖掘
2. Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach [J] . Al-Makhadmeh Zafer, Tolba Amr Computing . 2020,第2期

机译：使用杀手级自然语言处理的自动仇恨语音检测优化集成深度学习方法
3. Efficient Eye-Blinking Detection on Smartphones: A Hybrid Approach Based on Deep Learning [J] . Han Young-Joo, Kim Wooseong, Park Joon-Sang Mobile Information Systems . 2018,第PTa2期

机译：智能手机上的高效眨眼检测：基于深度学习的混合方法
4. Cross-domain and Cross-lingual Abusive Language Detection: a Hybrid Approach with Deep Learning and a Multilingual Lexicon [C] . Endang Wahyu Pamungkas, Viviana Patti Annual meeting of the Association for Computational Linguistics . 2019

机译：跨域和交叉语言滥用语言检测：具有深入学习的混合方法和多语种词典
5. Learning Deep Representations for Low-resource Cross-lingual Natural Language Processing [D] . Chen, Xilun. 2019

机译：学习深度表示资源少的跨语言自然语言处理
6. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach [O] . Xiaoyong Pan, Hong-Bin Shen 2017

机译：基于新的基于混合深度学习的跨域知识整合方法的RNA-蛋白质结合基序挖掘
7. Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon [O] . Endang Wahyu Pamungkas, Viviana Patti 2019

机译：跨域和交叉语言滥用语言检测：具有深入学习的混合方法和多语种词典

Cross-domain and Cross-lingual Abusive Language Detection: a Hybrid Approach with Deep Learning and a Multilingual Lexicon

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅