首页> 外文会议>Workshop on Computational Modeling of PEople's Opinions, PersonaLity, and Emotions in Social media >HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality,Diversity, and Inclusion
【24h】

HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality,Diversity, and Inclusion

机译:Hopeedi:多语言希望讲话检测数据集,用于平等,多样性和包含

获取原文

摘要

Over the past few years, systems have been developed to control online content and eliminate abusive, offensive or hate speech content. However, people in power sometimes misuse this form of censorship to obstruct the democratic right of freedom of speech. Therefore, it is imperative that research should take a positive reinforcement approach towards online content that is encouraging, positive and supportive contents. Until now, most studies have focused on solving this problem of negativity in the English language, though the problem is much more than just harmful content. Furthermore, it is multilingual as well. Thus, we have constructed a Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not. To our knowledge, this is the first research of its kind to annotate hope speech for equality, diversity and inclusion in a multilingual setting. We determined that the inter-annotator agreement of our dataset using Krippendorff's alpha. Further, we created several baselines to benchmark the resulting dataset and the results have been expressed using precision, recall and F1-score. The dataset is publicly available for the research community. We hope that this resource will spur further research on encouraging inclusive and responsive speech that reinforces positiveness.
机译:在过去的几年里,已经制定了系统来控制在线内容并消除滥用,冒犯或仇恨的言语内容。然而,有力人员有时滥用这种形式的审查,阻碍民主党自由权利。因此,研究必须采取积极的加强方法,以鼓励,积极和支持内容。到目前为止,大多数研究都致力于解决英语语言中的消极性问题,尽管问题不仅仅是有害内容。此外,它也是多语言的。因此,我们构建了一个相当,多样性和包含(Hopeedi)的希望语音数据集,其中包含来自社交媒体平台YouTube的用户生成的评论,分别用28,451,20,198和10,705个评论,分别用英文,泰米尔和马拉雅拉姆(Malayalam)手动标记为包含希望语音与否。为了我们的知识,这是对诠释的第一次研究,以诠释在多语言环境中的平等,多样性和包含的言论。我们确定了使用Krippendorff的Alpha的DataSet的互联网协议。此外,我们创建了几个基准以基准测试的基准测试数据集,并且使用精度,召回和F1分数表示结果。数据集公开可用于研究社区。我们希望这一资源进一步研究鼓励加强积极性的包容性和响应性言论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号