首页> 外文会议>Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies;International workshop on semantic evaluation >CiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets
【24h】

CiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets

机译:CiTIUS-COLE在SemEval-2019任务5:结合语言特征来识别针对多语言推文针对移民和妇女的仇恨言论

获取原文

摘要

This article describes the strategy submitted by the CiTIUS-COLE team to SemEval 2019 Task 5, a task which consists of binary classification where the system predicts whether a tweet in English or in Spanish is hateful against women or immigrants or not. The proposed strategy relies on combining linguistic features to improve the classifier's performance. More precisely, the method combines textual and lexical features, embedding words with the bag of words in Term Frequency-Inverse Document Frequency (TF-IDF) representation. The system performance reaches about 81% F1 when it is applied to the training dataset. but its F1 drops to 36% on the official test dataset for the English and 64% for the Spanish language concerning the hate speech class.
机译:本文介绍了CiTIUS-COLE团队向SemEval 2019任务5提交的策略,该任务由二进制分类组成,系统会预测英语或西班牙语的推文是否讨厌女性或移民。所提出的策略依赖于结合语言特征来提高分类器的性能。更准确地说,该方法结合了文本和词汇特征,将单词与词包以术语频率-逆文档频率(TF-IDF)表示形式嵌入。当将其应用于训练数据集时,系统性能将达到约81%F1。但在针对仇恨言语类的英语官方测试数据集上,其F1下降至36%,在西班牙语语言测试中,其F1下降至64%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号