Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems

Balayn Agathe; Lofi Christoph; Houben Geert-Jan

首页> 外文期刊>The VLDB journal >Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems

【24h】

Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems

机译：管理决策数据数据的偏见和不公平：对机器学习和数据工程方法的调查，以确定和减轻数据管理和分析系统中的偏见和不公平的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing use of data-driven decision support systems in industry and governments is accompanied by the discovery of a plethora of bias and unfairness issues in the outputs of these systems. Multiple computer science communities, and especially machine learning, have started to tackle this problem, often developing algorithmic solutions to mitigate biases to obtain fairer outputs. However, one of the core underlying causes for unfairness is bias in training data which is not fully covered by such approaches. Especially, bias in data is not yet a central topic in data engineering and management research. We survey research on bias and unfairness in several computer science domains, distinguishing between data management publications and other domains. This covers the creation of fairness metrics, fairness identification, and mitigation methods, software engineering approaches and biases in crowdsourcing activities. We identify relevant research gaps and show which data management activities could be repurposed to handle biases and which ones might reinforce such biases. In the second part, we argue for a novel data-centered approach overcoming the limitations of current algorithmic-centered methods. This approach focuses on eliciting and enforcing fairness requirements and constraints on data that systems are trained, validated, and used on. We argue for the need to extend database management systems to handle such constraints and mitigation methods. We discuss the associated future research directions regarding algorithms, formalization, modelling, users, and systems.

机译：在行业和政府中增加数据驱动决策支持系统的使用伴随着在这些系统的产出中发现了一种偏见和不公平问题。多种计算机科学社区，特别是机器学习，已经开始解决这个问题，常常开发算法解决方案来减轻偏差以获得更公平的输出。然而，不公平的核心基本原因之一是培训数据的偏见，这些数据没有被这种方法完全覆盖。特别是，数据中的偏差尚未成为数据工程和管理研究中的核心主题。我们调查了几个计算机科学域的偏见和不公平的研究，区分了数据管理出版物和其他领域。这涵盖了公平度量，公平识别和缓解方法，软件工程方法和群体中的缓解方法。我们确定了相关的研究差距，并显示了哪些数据管理活动可以重新掌握以处理偏见，并且哪些可能加强这种偏见。在第二部分中，我们争论了一种新的数据中心方法，克服了当前算法中心方法的局限性。这种方法侧重于引发和强制执行公平要求和对系统培训，验证和使用的数据的限制。我们争辩旨在扩展数据库管理系统以处理此类约束和缓解方法。我们讨论了关于算法，正式化，建模，用户和系统的相关未来研究方向。

著录项

来源
《The VLDB journal》 |2021年第5期|739-768|共30页
作者
Balayn Agathe; Lofi Christoph; Houben Geert-Jan;
展开▼
作者单位

Delft Univ Technol Delft Netherlands;

Delft Univ Technol Delft Netherlands;

Delft Univ Technol Delft Netherlands;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Bias and unfairness; Decision support systems; Data curation; Bias mitigation; Bias constraints for DBMS;

机译：偏见和不公平;决策支持系统;数据策委;偏见缓解;DBMS的偏置约束;

相似文献

外文文献
中文文献
专利

1. Good research practices for comparative effectiveness research: approaches to mitigate bias and confounding in the design of nonrandomized studies of treatment effects using secondary data sources: the International Society for Pharmacoeconomics and Outcomes Research Good Research Practices for Retrospective Database Analysis Task Force Report--Part II. [J] . Cox E, Martin BC, Van Staa T, Value in health: the journal of the International Society for Pharmacoeconomics and Outcomes Research . 2009,第8期

机译：相对有效性研究的良好研究实践：使用辅助数据源减轻治疗效果的非随机研究设计中的偏见和混淆的方法：国际药物经济学和结果学会回顾性数据库分析工作组研究的良好研究实践报告第二部分。
2. Residual Unfairness in Fair Machine Learning from Prejudiced Data [J] . Nathan Kallus, Angela Zhou JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：基于偏见数据的公平机器学习中的残留不公平
3. Subjective data, objective data and the role of bias in predictive modelling: Lessons from a dispositional learning analytics application [J] . Dirk Tempelaar, Bart Rienties, Quan Nguyen PLoS One . 2020,第6期

机译：主观数据，客观数据以及偏差在预测模型中的作用：来自讨论学习分析应用的课程
4. Data and Algorithmic Bias: Explaining the Network Effect in Opinion Dynamics and the Training Data Bias in Machine Learning [C] . Dino Pedreschi International conference on discovery science . 2019

机译：数据和算法偏差：解释意见动态中的网络效应和机器学习中的训练数据偏差
5. Statistical Approaches for Big Data Analytics and Machine Learning: Data-Driven Network Reconstruction and Predictive Modeling of Time Series Biological Systems. [D] . Farhangmehr, Farzaneh. 2014

机译：大数据分析和机器学习的统计方法：时间序列生物系统的数据驱动网络重构和预测建模。
6. Identifying Sources of Potential Bias When Using Online Survey Data to Explore Horse Training Management and Behaviour: A Systematic Literature Review [O] . Kate Fenner, Michelle Hyde, Angela Crean, 2020

机译：在使用在线调查数据时识别潜在偏差的来源以探索马培训管理和行为：系统文献综述
7. Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems [O] . Agathe Balayn, Christoph Lofi, Geert-Jan Houben 2021

机译：管理决策支持数据的偏见和不公平：机器学习和数据工程方法的调查，以确定和减轻数据管理和分析系统中的偏见和不公平的方法

Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems

摘要

著录项

相似文献

相关主题

期刊订阅