首页> 外文会议>2011 IEEE Seventh International Conference on e-Science Workshops >Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project
【24h】

Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project

机译:紧急过滤器:大型公民科学项目中的自动数据验证

获取原文
获取原文并翻译 | 示例

摘要

Research projects that use the efforts of volunteers (“citizen scientists”) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.
机译:利用志愿者(“公民科学家”)的努力来收集有关生物发生的数据的研究项目必须解决观察者变异性和物种识别错误的问题。尽管公民科学项目可以吸引大量志愿者来收集大量数据,但它们倾向于包含报告错误。我们在eBird这个公民科学项目中的经验,该项目吸引了成千上万的志愿者来收集鸟类的观测资料,这表明需要志愿者专家的大量努力来筛选数据,识别异常值并将其标记在数据库中。但是,eBird收集的数据量不断增加,给这些志愿者专家带来了沉重负担。为了最大程度地减少这种人工操作,我们探讨了以前收集的eBird数据是否可用于创建从数据中出现的自动质量过滤器。我们通过两步过程来做到这一点。首先,基于数据的方法会检测异常​​值(即,在给定区域和一年中的一周中不常见的观测值)。接下来,一种估计观察者专业知识的新颖机器学习方法用于确定是否应标记异常观察。我们的初步发现表明,该自动化过程能够可靠地识别异常值并将其准确地分类为错误或代表潜在有价值的观察结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号