Fair and balanced?

机译：公平和平衡吗？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.

机译：长期以来，软件工程研究人员一直对代码中的错误发生位置和原因以及预测下一步可能发生的错误感兴趣。历史错误发生数据一直是这项研究的关键。错误跟踪系统和代码版本历史记录何时，如何以及由谁修复错误;从这些来源中，可以提取将文件更改与错误修复相关联的数据集。这些历史数据集可用于测试有关错误引入过程的假设，还可用于建立统计错误预测模型。不幸的是，过程和人员是不完善的，并且在源代码版本历史中实际上仅标记了部分错误修复，因此可以在提取的数据集中进行研究。问题自然而然地出现了，这些历史数据集中记录的错误修复程序是否可以正确表示全部错误修复程序？在本文中，我们调查了几个软件项目的历史数据，并找到了系统偏见的有力证据。然后，我们调查“不公平，不平衡”数据集对预测技术性能的潜在影响。我们得出的教训是，偏见是一个关键问题，它威胁着依赖偏见数据集构建预测模型的流程的有效性以及对偏见数据测试的假设的普遍性。

著录项

来源
《Joint meeting of the European software engineering conference;ACM SIGSOFT symposium on The foundations of software engineering》|2009年|P.121 - 130|共10页
会议地点
作者
Christian Bird; Adrian Bachmann; Eirik Aune; John Duffy; Abraham Bernstein; Vladimir Filkov; Premkumar Devanbu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类软件工程;
关键词
bias;

机译：偏压;

相似文献

外文文献
中文文献
专利

1. Fair and Balanced Is in the Eyes Of the Mediation Parties. Not Yours [J] . Robert A. Creo Alternatives to the High Cost of Litigation . 2021,第3期

机译：公平和平衡是在调解派对的眼中。不是你的
2. Balanced but not fair: Strategic balancing, rating allocations, and third-party intermediaries [J] . Bowers Anne Strategic Organization . 2020,第3期

机译：平衡但不公平：战略平衡，评级拨款和第三方中间人
3. Performance of Balanced Fairness in Resource Pools: A Recursive Approach [J] . Thomas Bonald, Celine Comte, Fabien Mathieu Performance evaluation review . 2018,第1期

机译：资源池中平衡公平的绩效：一种递归方法
4. Constructing Balanced, Conflict-Minimal, Overlap-Fair Channel Sensing Schedules [C] . Paulo Aragao, Reinhard Gotzhein International Conference on Advanced Information Networking and Applications . 2020

机译：构建平衡，冲突最小，重叠公平公平的频道传感时间表
5. Fair and gender balanced? Uncovering the roots of women's under-representation as political news sources. [D] . Baitinger, Gail. 2016

机译：公平和性别平衡？揭示妇女担任政治新闻来源人数不足的根源。
6. Fairs fair argument and voluntarism in clinical research: But is it fair? [O] . M A Perna 2006

机译：临床研究中的公平公正的论点和自愿性：但是这公平吗？
7. A Queueing Analysis of Max-Min Fairness, Proportional Fairness and Balanced Fairness [O] . Bonald, Thomas, Massoulié, Laurent, Proutière, Alexandre, 2006

机译：最大最小公平，比例公平和平衡公平的排队分析
8. Traffic-Splitting Networks Operating Under Alpha-Fair Sharing Policies and Balanced Fairness; Probability rept [R] . Lieshout, P. M. D. 2007

机译：在alpha-Fair共享政策和平衡公平的情况下运营的流量分割网络;概率来看

Fair and balanced?

摘要

著录项

相似文献

相关主题

期刊订阅