首页> 外文期刊>Taxon >Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system
【24h】

Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system

机译:植物标料标本的大规模数字化:自动化,高通量输送系统的开发和应用

获取原文
获取原文并翻译 | 示例
           

摘要

The billions of specimens housed in natural science collections provide a tremendous source of under-utilized data that are useful for scientific research, conservation, commerce, and education. Digitization and mobilization of specimen data and images promises to greatly accelerate their utilization. While digitization of natural science collection specimens has been occurring for decades, the vast majority of specimens remain un-digitized. If the digitization task is to be completed in the near future, innovative, high-throughput approaches are needed. To create a dataset for the study of global change in New England, we designed and implemented an industrial-scale, conveyor-based digitization workflow for herbarium specimen sheets. The workflow is a variation of an object-to-image-to-data workflow that prioritizes imaging and the capture of storage container-level data. The workflow utilizes a novel conveyor system developed specifically for the task of imaging flattened herbarium specimens. Using our workflow, we imaged and transcribed specimen-level data for almost 350,000 specimens over a 131-week period; an additional 56 weeks was required for storage container-level data capture. Our project has demonstrated that it is possible to capture both an image of a specimen and a core database record in 35 seconds per herbarium sheet (for intervals between images of 30 minutes or less) plus some additional overhead for container-level data capture. This rate was in line with the pre-project expectations for our approach. Our throughput rates are comparable with some other similar, high-throughput approaches focused on digitizing herbarium sheets and is as much as three times faster than rates achieved with more conventional non-automated approaches used during the project. We report on challenges encountered during development and use of our system and discuss ways in which our workflow could be improved. The conveyor apparatus software, database schema, configuration files, hardware list, and conveyor schematics are available for download on GitHub.
机译:在自然科学系列中居住的数十亿个标本提供了一个巨大的利用资料来源,可用于科学研究,保护,商业和教育。标本数据和图像的数字化和动员有望大大加速他们的利用。虽然已经发生了天然科学收集标本的数字化数十年来,但绝大多数标本仍未取消数字化。如果要在不久的将来完成数字化任务,则需要创新的,高吞吐量方法。为了创建新英格兰全球变化研究的数据集,我们设计并实施了工业规模,用于植物标目典的基于输送机的数字化工作流程。工作流是对象到数据到数据工作流的变体,其优先化成像和存储容器级数据的捕获。工作流利用专门为成像扁平的植物标本标本的任务开发的新型输送系统。使用我们的工作流程,我们在131周的时间内成像并转录了近350,000个标本的标本级数据;存储容器级数据捕获需要额外的56周。我们的项目已经证明,可以在每个植物标记表(30分钟或更短的图像之间的间隔)加上35秒内捕获样本和核心数据库记录的图像,以及用于容器级数据捕获的一些额外开销。此费率符合项目前预期的方法。我们的吞吐率与其他类似的高通量方法相当,高通量方法集中在数字化植物标料上,并且比项目中使用的更多传统的非自动化方法实现的速率快三倍。我们报告了在我们的系统开发和使用过程中遇到的挑战,并讨论了我们的工作流程可以得到改善的方式。输送设备软件,数据库架构,配置文件,硬件列表和传送器原理图标可用于GitHub上下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号