首页> 外文会议>Workshop on Speech and Language Technologies for Dravidian Languages >JUNLP@DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Langauges
【24h】

JUNLP@DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Langauges

机译:JNLP @ Dravidianlangtech-EACL 2021:Dravidian语言中的攻击性语言识别

获取原文

摘要

Offensive language identification has been an active area of research in natural language processing. With the emergence of multiple social media platforms offensive language identification has emerged as a need of the hour. Traditional offensive language identification models fail to deliver acceptable results as social media contents are largely in multilingual and are code-mixed in nature. This paper tries to resolve this problem by using IndicBERT and BERT architectures, to facilitate identification of offensive languages for Kannada-English, Malayalam-English, and Tamil-English code-mixed language pairs extracted from social media. The presented approach when evaluated on the test corpus provided precision, recall, and F1 score for language pair Kannada-English as 0.62, 0.71, and 0.66, respectively, for language pair Malayalam-English as 0.77, 0.43, and 0.53, respectively, and for Tamil-English as 0.71,0.74, and 0.72, respectively.
机译:令人反感的语言识别是自然语言处理中的活跃领域。 随着多个社交媒体平台的出现,令人攻击的语言识别已经出现了一个小时。 传统的攻击性语言识别模型未能提供可接受的结果,因为社交媒体内容主要是多语言,并且在自然中是代码混合的。 本文试图通过使用Takebert和Bert架构来解决此问题,以便于从社交媒体提取kannada-English,Malayalam-English和Tamil-English-Code-Mand-Mancial-Commicy语言对的攻击性语言。 在测试语料库中评估时,呈现的方法分别为kannada-braing为0.62,0.71和0.66,分别为0.62,0.71和0.66分别提供精确,召回和F1分数分别为0.77,0.43和0.53,以及 对于泰米尔英语为0.71,0.74和0.72。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号