【24h】

Relating Word Embedding Gender Biases to Gender Gaps: A Cross-Cultural Analysis

机译:词嵌入性别偏见与性别差距的关系:跨文化分析

获取原文

摘要

Modern models for common NLP tasks often employ machine learning techniques and train on journalistic, social media, or other culturally-derived text. These have recently been scrutinized for racial and gender biases, rooting from inherent bias in their training text. These biases are often sub-optimal and recent work poses methods to rectify them; however, these biases may shed light on actual racial or gender gaps in the culture(s) that produced the training text, thereby helping us understand cultural context through big data. This paper presents an approach for quantifying gender bias in word embeddings, and then using them to characterize statistical gender gaps in education, politics, economics, and health. We validate these metrics on 2018 Twitter data spanning 51 U.S. regions and 99 countries. We correlate state and country word embedding biases with 18 international and 5 U.S.-based statistical gender gaps, characterizing regularities and predictive strength.
机译:用于NLP常见任务的现代模型通常采用机器学习技术,并在新闻,社交媒体或其他文化衍生的文本上进行培训。这些问题最近因种族和性别偏见而受到审查,这源于其培训文本中的固有偏见。这些偏见通常不是最理想的,最近的工作提出了纠正它们的方法。但是,这些偏见可能会揭示出产生培训文本的文化中的实际种族或性别差距,从而帮助我们通过大数据来理解文化背景。本文提出了一种量化单词嵌入中性别偏见的方法,然后使用它们来表征教育,政治,经济和健康方面的统计性别差距。我们在涵盖51个美国地区和99个国家/地区的2018 Twitter数据上验证了这些指标。我们将州和国家/地区字词嵌入偏向与18个国际和5个美国基于统计的性别差距相关联,以表征规律性和预测强度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号