Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project

机译：紧急过滤器：大型公民科学项目中的自动数据验证

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Research projects that use the efforts of volunteers (âœcitizen scientistsâ) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.

机译：利用志愿者（“公民科学家”）的努力来收集有关生物发生的数据的研究项目必须解决观察者变异性和物种识别错误的问题。尽管公民科学项目可以吸引大量志愿者来收集大量数据，但它们倾向于包含报告错误。我们在eBird这个公民科学项目中的经验，该项目吸引了成千上万的志愿者来收集鸟类的观测资料，这表明需要志愿者专家的大量努力来筛选数据，识别异常值并将其标记在数据库中。但是，eBird收集的数据量不断增加，给这些志愿者专家带来了沉重负担。为了最大程度地减少这种人工操作，我们探讨了以前收集的eBird数据是否可用于创建从数据中出现的自动质量过滤器。我们通过两步过程来做到这一点。首先，基于数据的方法会检测异常值（即，在给定区域和一年中的一周中不常见的观测值）。接下来，一种估计观察者专业知识的新颖机器学习方法用于确定是否应标记异常观察。我们的初步发现表明，该自动化过程能够可靠地识别异常值并将其准确地分类为错误或代表潜在有价值的观察结果。

著录项

来源
《2011 IEEE Seventh International Conference on e-Science Workshops》|2011年|p.20-27|共8页
会议地点 Stockholm(SE)
作者
Kelling Steve; Yu Jun; Gerbracht Jeff; Wong Weng-Keen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
citizen-science; data quality; data-base filters; machine learning; species occurrence;

机译：公民科学;数据质量;数据库过滤器;机器学习;物种发生;

相似文献

外文文献
中文文献
专利

1. Citizen science on speed? Realising the triple objective of scientific rigour, policy influence and deep citizen engagement in a large-scale citizen science project on ambient air quality in Antwerp [J] . Van Brussel Suzanne, Huyse Huib Journal of Environmental Planning and Management . 2019,第3a4期

机译：公民科学速度？在安特卫普环境空气质量的大型公民科学项目中实现科学严谨，政策影响和深度公民参与的三倍目标
2. Comparison of large-scale citizen science data and long-term study data for phenology modeling [J] . SHAWN D. TAYLOR, JOAN M. MEINERS, KRISTINA RIEMER, Ecology: A Publication of the Ecological Society of America . 2019,第2期

机译：大规模公民科学数据的比较和苯版模型的长期研究数据
3. Detailed large-scale mapping of geographical variation of Yellowhammer Emberiza citrinella song dialects in a citizen science project [J] . Diblikova Lucie, Pipek Pavel, Petrusek Adam, IBIS . 2019,第2期

机译：公民科学项目中黄汉Emberiza Citrinella歌曲方言的地理变异的详细大规模映射
4. Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project [C] . Kelling Steve, Yu Jun, Gerbracht Jeff, IEEE Seventh International Conference on e-Science Workshops . 2011

机译：紧急过滤器：大型公民科学项目中的自动数据验证
5. A Framework for Statistical and Computational Reproducibility in Large-Scale Data Analysis Projects with a Focus on Automated Forensic Bullet Evidence Comparison [D] . Rice, Kiegan Erin. 2020

机译：大规模数据分析项目中的统计和计算可重复性框架，重点是自动取证子弹证据比较
6. Taking a ‘Big Data’ approach to data quality in a citizen science project [O] . Steve Kelling, Daniel Fink, Frank A. La Sorte, 2015

机译：在公民科学项目中采用大数据方法提高数据质量
7. Automated data verification in a large-scale citizen science project: a case study [O] . Jun Yu, Steve Kelling, Jeff Gerbracht, 2012

机译：大规模公民科学项目中的自动数据验证：案例研究

Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project

摘要

著录项

相似文献

相关主题

期刊订阅