...
首页> 外文期刊>Data Science and Engineering >Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach
【24h】

Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach

机译:使用半监控多级神经方法进行细粒度的多标签性别歧视分类

获取原文

摘要

Sexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representatives engaged in policy making. In this paper, we examine the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism. We devise self-training-based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. In addition to data augmentation methods, we develop a neural model which combines biLSTM and attention with a domain-adapted BERT model in an end-to-end trainable manner. Further, we formulate a multi-level training approach in which models are sequentially trained using categories of sexism of different levels of granularity. Moreover, we devise a loss function that exploits any label confidence scores associated with the data. Several proposed methods outperform various baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.
机译:性别歧视,一种渗透形式的压迫,导致深受各种表现的痛苦。鉴于在线共享的性别歧视的经验越来越多,自动分类这些回忆可以支持反对性别歧视的战斗,因为它可以通过从事政策制定的性别研究研究人员和政府代表来促进成功的评估。在本文中,我们检查了性主义的细粒度,多标签分类(报告)。据我们所知,我们考虑到通过我们的23级问题制定的任何相关事先工作的大类性别歧视。此外,我们向描述任何类型的性别歧视的账户的多标签分类提供第一个半监督工作。我们设计了根据问题的多标签性质的自我培训技术,以利用未标记的样本来增加标记的集合。我们对现有标签集的高性化多样性确定为候选未标记实例的理想质量,并开发将其纳入我们的方法的方法。我们还探讨了将多数级别不平衡缓解的方式,独立地和与涉及多样性的方法结合使用。除了数据增强方法之外,我们还开发一个神经模型,它以端到端可训练方式将Bilstm和注意力与域适应的BERT模型相结合。此外,我们制定了一种多级培训方法,其中使用不同粒度水平的性别歧视的类别依次训练模型。此外,我们设计了一种利用与数据相关的任何标签置信度分数的损失函数。几种提出的方​​法占据了最近发布的数据集上的各种基线,用于多个标准度量的多标签性别歧视。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号