CFILT IIT Bombay@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion using Multilingual Representation from Transformers

机译：CFilt IIT Bombay @ LT-EDI-EACL2021：希望使用来自变压器的多语言表示的平等，分集和包含的语音检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the internet becoming part and parcel of our lives, engagement in social media has increased a lot. Identifying and eliminating offensive content from social media has become of utmost priority to prevent any kind of violence. However, detecting encouraging, supportive and positive content is equally important to prevent misuse of censorship targeted to attack freedom of speech. This paper presents our system for the shared task Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI, EACL 2021. The data for this shared task is provided in English, Tamil, and Malayalam which was collected from YouTube comments. It is a multi-class classification problem where each data instance is categorized into one of the three classes: 'Hope speech'. 'Not hope speech', and 'Not in intended language'. We propose a system that employs multilingual transformer models to obtain the representation of text and classifies it into one of the three classes. We explored the use of multilingual models trained specifically for Indian languages along with generic multilingual models. Our system was ranked 2nd for English, 2nd for Malayalam, and 7th for the Tamil language in the final leader board published by organizers and obtained a weighted F1-score of 0.92, 0.84, 0.55 respectively on the hidden test dataset used for the competition. We have made our system publicly available at GitHub.

机译：随着互联网成为我们生活的一部分和包裹，社交媒体的参与增加了很多。识别和消除社交媒体的冒犯内容已成为预防任何文化的优先事项。然而，检测鼓励，支持性和积极的内容同样重要，无法防止滥用攻击攻击言论自由的审查。本文介绍了我们的共享任务的系统希望语音检测在LT-EDI中的平等，分集和包含在LT-EDI，EACE 2021中。此共享任务的数据以英文，泰米尔和马拉雅拉姆提供，这些任务是从YouTube评论中收集的。它是一个多级分类问题，每个数据实例被分类为三类之一：'希望演讲'。 '不希望演讲'，并“不是预期的语言”。我们提出了一个使用多语言变压器模型的系统来获取文本的表示，并将其分类为三个类中的一个。我们探讨了使用专门用于印度语言的多语种模型以及通用的多语言模型。我们的系统在组织者发布的最终领导人董事会中排名第2，为Malayalam进行了第2名，第7位泰米尔语言，并分别在用于竞争的隐藏测试数据集中获得了0.92,0.84,0.55的加权F1分数。我们已经在Github上公开提供了我们的系统。

著录项

来源
《Workshop on Language Technology for Equality, Diversity and Inclusion》|2021年|193-196|共4页
会议地点
作者
Pankaj Singh; Prince Kumar; Pushpak Bhattacharyya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献

1. 2021 C-Suite Awareness—From Privacy to Misinformation to Diversity, Inclusion, and Equality [J] . Andriole Stephen J. IT professional . 2021,第2期

机译：2021 C-Suite意识 - 从隐私到误导，以多样性，包容性和平等
2. LAUNCH OF THE DIVERSITY,EQUALITY,AND INCLUSION COMMITTEE [J] . Institute of Water journal . 2021,第211期

机译：推出多样性，平等和纳入委员会
3. Strengthening RIPE's commitment to equality, diversity, and inclusion in our field [J] . Bair Jennifer, Gabor Daniela, Germain Randall, Review of International Political Economy . 2021,第1期

机译：加强成熟对我们领域的平等，多样性和纳入的承诺
4. HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality,Diversity, and Inclusion [C] . Bharathi Raja Chakravarthi Workshop on Computational Modeling of PEople's Opinions, PersonaLity, and Emotions in Social media . 2020

机译：Hopeedi：多语言希望讲话检测数据集，用于平等，多样性和包含
5. Equality Inclusion and Diversity in Healthcare During the COVID-19 Pandemic [O] . Jayoung Kim 2020

机译：COVID-19大流行期间医疗保健中的平等包容和多样性
6. Multilingual Multimodal Integration of Sketch and Speech: A Generic Speech Representation Model for Spatial Description [O] . Lee-Na Teh, Alvin W. Yeo 2009

机译：素描与语音的多语言多峰集成：空间描述的通用语音表示模型

CFILT IIT Bombay@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion using Multilingual Representation from Transformers

摘要

著录项

相似文献

相关主题

期刊订阅