Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

David Rozado

首页> 外文期刊>PLoS One >Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

【24h】

Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

机译：使用大情感词汇的Word嵌入式模型中的算法偏置的宽范围筛选揭示了额外的偏置类型

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias . Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.

机译：关于嵌入模型中的性别偏见的关注已经在算法偏见研究文献中捕获了大量的关注。然而，其他偏见类型已收到较小量的审查。这项工作描述了沿着性别和民族的流行词嵌入模型中的情绪协会的大规模分析，但也沿着社会经济地位，年龄，身体外观，性取向，宗教情绪和政治倾向的较少学习的维度。与以前的学术文学一致，这项工作已经发现，在大多数嵌入模型中，在审查的非洲裔美国人中受到非洲裔美国人流行的给定名称的系统偏见。然而，嵌入模型中的性别偏差似乎是多方面的，并且通常以定期报告的原因反转。有趣的是，使用公平文献中术语偏见的共同运行，还识别了嵌入模型中的到目前为止未报告的偏置类型的新型类型。具体而言，在此处分析了流行的嵌入模型，展示了对中间和工人级社会经济地位，男性儿童，老年人，平静的身体外观和智力现象等负面偏见，例如伊斯兰宗教信仰，非宗教信仰和保守的政治定位。在相关文献中，这些偏差类型的矛盾泄露的原因可能是歧管，但在寻找算法偏见时广泛地保持盲点，并且缺乏广泛的技术术语来明确描述各种算法关联可以想到可以想到扮演角色。附加到嵌入模型内的不同人口统计组的多个加载关联的因果渊源通常不清楚，但是所述关联及其潜在的多因素根部的异质性提高了对伞术语偏差下所有的对其进行分组的有效性的疑虑。更丰富和更细粒度的术语以及对偏差景观的更全面的探索可以帮助公平认知社区更有效地表征和中和算法辨别。

著录项

来源
《PLoS One》 |2020年第4期|共26页
作者
David Rozado;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. A CO-TRAINING MODEL USING A FUZZY C-MEANS ALGORITHM, A K-MEANS ALGORITHM AND THE SENTIMENT LEXICONS - BASED MULTI-DIMENSIONAL VECTORS OF AN OTSUKA COEFFICIENT FOR ENGLISH SENTIMENT CLASSIFICATION [J] . DR.VO NGOC PHU, VO THI NGOC TRAN Journal of Theoretical and Applied Information Technology . 2018,第15期

机译：基于模糊C-均值算法，K-均值算法和基于情感词缀的大冢多维向量的英语情感分类的协同训练模型
2. A SELF-TRAINING - BASED MODEL USING A K-NN ALGORITHM AND THE SENTIMENT LEXICONS - BASED MULTI-DIMENSIONAL VECTORS OF A S6 COEFFICIENT FOR SENTIMENT CLASSIFICATION [J] . DR.VO NGOC PHU, DR.VO THI NGOC TRAN Journal of Theoretical and Applied Information Technology . 2018,第11期

机译：基于K-NN算法和S-系数的S6系数多维向量的基于情感词的自训练模型
3. Modeling of NBTI Stress Induced Hole-Trapping and Interface-State-Generation Mechanisms under a Wide Range of Bias Conditions [J] . Chenvue MA, Hans Juergen MATTAUSCH, Masataka MIYAKE, IEICE Transactions on Electronics . 2013,第10期

机译：宽偏置条件下NBTI应力诱导的空穴陷阱和界面状态生成机制的建模
4. Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories [C] . Kaytlin Chaloner, Alfredo Maldonado Workshop on gender bias in natural language processing;Annual meeting of the Association for Computational Linguistics . 2019

机译：测量跨域词嵌入中的性别偏见并发现新的性别偏见词类别
5. An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models [D] . Mohan, Preeti. 2021

机译：通过非语境词嵌入模型比较k-12分配文献中的性别偏差分析
6. Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [O] . David Rozado 2020

机译：使用大型情绪词典的Word嵌入模型中的算法偏置的广泛绘制筛选揭示了额外的偏差类型
7. Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [O] . David Rozado 2020

机译：使用大型情绪词典的Word嵌入模型中的算法偏置的广泛绘制筛选揭示了额外的偏差类型

Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

摘要

著录项

相似文献

相关主题

期刊订阅