首页> 外文期刊>Ecological informatics: an international journal on ecoinformatics and computational ecology >Small values in big data: The continuing need for appropriate metadata
【24h】

Small values in big data: The continuing need for appropriate metadata

机译:大数据中的小值:继续需要适当的元数据

获取原文
获取原文并翻译 | 示例
       

摘要

Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many ecological datasets contain left-censored data observations below an analytical detection limit. Studies from single and typically small datasets show that common approaches for handling censored data - e.g., deletion or substituting fixed values - result in systematic biases. However, no studies have explored the degree to which the documentation and presence of censored data influence outcomes from large, multi-sourced datasets. We describe left-censored data in a lake water quality database assembled from 74 sources and illustrate the challenges of dealing with small values in big data, including detection limits that are absent, range widely, and show trends over time. We show that substitutions of censored data can also bias analyses using 'big data' datasets, that censored data can be effectively handled with modem quantitative approaches, but that such approaches rely on accurate metadata that describe treatment of censored data from each source.
机译:从不同来源编译数据以解决强迫生态问题越来越普遍。许多生态数据集包含以下低于分析检测限的左缩短的数据观察。来自单个和通常小型数据集的研究表明,处理删除的数据的常见方法 - 例如,删除或替换固定值 - 导致系统偏差。但是,没有研究探索了文档和截取数据影响来自大型多源数据集的审查数据的程度。我们描述了从74个来源组装的湖水质量数据库中的左裁监数据,并说明了在大数据中处理小值的挑战,包括缺少的检测限,范围广泛,以及随着时间的推移显示趋势。我们表明,被禁用的数据的替换也可以使用“大数据”数据集进行偏见分析,该数据集可以通过调制解调器定量方法有效地处理审查的数据,但这种方法依赖于描述来自每个来源的截取数据的准确元数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号