...
首页> 外文期刊>PLoS One >Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types
【24h】

Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

机译:使用大情感词汇的Word嵌入式模型中的算法偏置的宽范围筛选揭示了额外的偏置类型

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias . Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.
机译:关于嵌入模型中的性别偏见的关注已经在算法偏见研究文献中捕获了大量的关注。然而,其他偏见类型已收到较小量的审查。这项工作描述了沿着性别和民族的流行词嵌入模型中的情绪协会的大规模分析,但也沿着社会经济地位,年龄,身体外观,性取向,宗教情绪和政治倾向的较少学习的维度。与以前的学术文学一致,这项工作已经发现,在大多数嵌入模型中,在审查的非洲裔美国人中受到非洲裔美国人流行的给定名称的系统偏见。然而,嵌入模型中的性别偏差似乎是多方面的,并且通常以定期报告的原因反转。有趣的是,使用公平文献中术语偏见的共同运行,还识别了嵌入模型中的到目前为止未报告的偏置类型的新型类型。具体而言,在此处分析了流行的嵌入模型,展示了对中间和工人级社会经济地位,男性儿童,老年人,平静的身体外观和智力现象等负面偏见,例如伊斯兰宗教信仰,非宗教信仰和保守的政治定位。在相关文献中,这些偏差类型的矛盾泄露的原因可能是歧管,但在寻找算法偏见时广泛地保持盲点,并且缺乏广泛的技术术语来明确描述各种算法关联可以想到可以想到扮演角色。附加到嵌入模型内的不同人口统计组的多个加载关联的因果渊源通常不清楚,但是所述关联及其潜在的多因素根部的异质性提高了对伞术语偏差下所有的对其进行分组的有效性的疑虑。更丰富和更细粒度的术语以及对偏差景观的更全面的探索可以帮助公平认知社区更有效地表征和中和算法辨别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号