首页> 外文期刊>Future generation computer systems >Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings
【24h】

Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings

机译:在西班牙语推文中检测厌叫。一种基于语言学特征和单词嵌入的方法

获取原文
获取原文并翻译 | 示例

摘要

Online social networks allow powerless people to gain enormous amounts of control over particular people's lives and profit from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great efforts have recently been made to identify misogyny, it is still difficult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not sufficient. Moreover, as Spanish is spoken worldwide, context and cultural differences can complicate this identification. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classified it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classification based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identification of misogyny. We have evaluated our proposal with three machine-learning classifiers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results.
机译:在线社交网络允许无能为力的人民获得互联网提供的匿名或社交距离的特定人民的生命和利润,以获得巨大的控制,以便骚扰别人。其中一个最常见的群体包括妇女,因为令人遗憾的是,令人厌恶的是,我们在社会中的现实。然而,虽然最近努力识别MISOGYNY,但仍然很难区分,因为它有时可以非常微妙,深入,表示使用统计方法是不够的。此外,随着全世界的展示西班牙语,背景和文化差异可以使这种识别复杂化。我们对西班牙语中的MISGYNY检测的贡献是两倍。一方面,我们应用情绪分析和社交计算技术来检测Twitter中的误解消息。另一方面,我们已经编制了西班牙语Misocorpus-2020,这是一个有关西班牙语的Mimogyny的平衡态度,并将其分为三个与(1)对相关妇女暴力的子集,(2)从西班牙和西班牙人骚扰妇女的消息美国,(3)与厌恶有关的一般特质。我们的提案结合了基于平均词嵌入和语言特征的分类,以了解哪些语言现象主要有助于识别厌恶。我们评估了三种机器学习分类器的提案,实现了85.175%的最佳准确性。最后,拟议的方法也验证了现有的Misogyny和侵略性探测的基础,例如AMI和讨厌获得了良好的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号