Online texts-across genres, registers, domains, and styles-are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Boluk-basi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.
展开▼
机译:各种体裁,注册,领域和样式的在线文本都充斥着人类的刻板印象,以明显或微妙的方式表达出来。在这些文本上受过训练的词嵌入,使这些陈规定型观念得以延续和扩大,并向使用词嵌入作为特征的机器学习模型传播偏见。在这项工作中,我们提出了一种在种族和宗教等多类环境中消除单词嵌入偏差的方法,将(Boluk-basi et al。,2016)的工作从二进制环境(例如性别性别)扩展了出来。接下来,我们提出了一种用于评估多类去偏置的新颖方法。我们证明了我们的多类去偏置功能强大,并且可以在标准NLP任务中保持效力。
展开▼