HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality,Diversity, and Inclusion

机译：Hopeedi：多语言希望讲话检测数据集，用于平等，多样性和包含

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the past few years, systems have been developed to control online content and eliminate abusive, offensive or hate speech content. However, people in power sometimes misuse this form of censorship to obstruct the democratic right of freedom of speech. Therefore, it is imperative that research should take a positive reinforcement approach towards online content that is encouraging, positive and supportive contents. Until now, most studies have focused on solving this problem of negativity in the English language, though the problem is much more than just harmful content. Furthermore, it is multilingual as well. Thus, we have constructed a Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not. To our knowledge, this is the first research of its kind to annotate hope speech for equality, diversity and inclusion in a multilingual setting. We determined that the inter-annotator agreement of our dataset using Krippendorff's alpha. Further, we created several baselines to benchmark the resulting dataset and the results have been expressed using precision, recall and F1-score. The dataset is publicly available for the research community. We hope that this resource will spur further research on encouraging inclusive and responsive speech that reinforces positiveness.

机译：在过去的几年里，已经制定了系统来控制在线内容并消除滥用，冒犯或仇恨的言语内容。然而，有力人员有时滥用这种形式的审查，阻碍民主党自由权利。因此，研究必须采取积极的加强方法，以鼓励，积极和支持内容。到目前为止，大多数研究都致力于解决英语语言中的消极性问题，尽管问题不仅仅是有害内容。此外，它也是多语言的。因此，我们构建了一个相当，多样性和包含（Hopeedi）的希望语音数据集，其中包含来自社交媒体平台YouTube的用户生成的评论，分别用28,451,20,198和10,705个评论，分别用英文，泰米尔和马拉雅拉姆（Malayalam）手动标记为包含希望语音与否。为了我们的知识，这是对诠释的第一次研究，以诠释在多语言环境中的平等，多样性和包含的言论。我们确定了使用Krippendorff的Alpha的DataSet的互联网协议。此外，我们创建了几个基准以基准测试的基准测试数据集，并且使用精度，召回和F1分数表示结果。数据集公开可用于研究社区。我们希望这一资源进一步研究鼓励加强积极性的包容性和响应性言论。

著录项

来源
《Workshop on Computational Modeling of PEople's Opinions, PersonaLity, and Emotions in Social media》|2020年|41-53|共13页
会议地点
作者
Bharathi Raja Chakravarthi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Semi-automatic generation of multilingual datasets for stance detection in Twitter [J] . Zotova Elena, Agerri Rodrigo, Rigau German Expert systems with applications . 2021,第May期

机译：半自动生成多语言数据集，用于Twitter中的姿态检测
2. Inclusion of costs in conservation planning depends on limited datasets and hopeful assumptions [J] . Paul R. Armsworth Annals of the New York Academy of Sciences . 2014,第期

机译：将成本包括在保护规划中取决于有限的数据集和有希望的假设
3. 2021 C-Suite Awareness—From Privacy to Misinformation to Diversity, Inclusion, and Equality [J] . Andriole Stephen J. IT professional . 2021,第2期

机译：2021 C-Suite意识 - 从隐私到误导，以多样性，包容性和平等
4. KU_NLP@LT-EDI-EACL2021: A Multilingual Hope Speech Detection for Equality, Diversity, and Inclusion using Context Aware Embeddings [C] . Junaida M K, Ajees A P Workshop on Language Technology for Equality, Diversity and Inclusion . 2021

机译：ku_nlp @ lt-edi-eacl2021：使用上下文感知嵌入式的平等，多样性和包含的多语言希望语音检测
5. Advancing Analysis of Non-metallic Inclusion Datasets [D] . Abdulsalam, Mohammad F. 2021

机译：促进非金属包涵数据集的分析
6. Equality Inclusion and Diversity in Healthcare During the COVID-19 Pandemic [O] . Jayoung Kim 2020

机译：COVID-19大流行期间医疗保健中的平等包容和多样性
7. Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets [O] . Nedjma Ousidhoum, Yangqiu Song, Dit-Yan Yeung 2020

机译：多语种仇恨语音数据集中标签 - 无话会选择偏差的比较评估

HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality,Diversity, and Inclusion

摘要

著录项

相似文献

相关主题

期刊订阅