Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system

Sweeney Patrick W.; Starly Binil; Morris Paul J.; Xu Yiming; Jones Aimee; Radhakrishnan Sridhar; Grassa Christopher J.; Davis Charles C.

首页> 外文期刊>Taxon >Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system

【24h】

Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system

机译：植物标料标本的大规模数字化：自动化，高通量输送系统的开发和应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The billions of specimens housed in natural science collections provide a tremendous source of under-utilized data that are useful for scientific research, conservation, commerce, and education. Digitization and mobilization of specimen data and images promises to greatly accelerate their utilization. While digitization of natural science collection specimens has been occurring for decades, the vast majority of specimens remain un-digitized. If the digitization task is to be completed in the near future, innovative, high-throughput approaches are needed. To create a dataset for the study of global change in New England, we designed and implemented an industrial-scale, conveyor-based digitization workflow for herbarium specimen sheets. The workflow is a variation of an object-to-image-to-data workflow that prioritizes imaging and the capture of storage container-level data. The workflow utilizes a novel conveyor system developed specifically for the task of imaging flattened herbarium specimens. Using our workflow, we imaged and transcribed specimen-level data for almost 350,000 specimens over a 131-week period; an additional 56 weeks was required for storage container-level data capture. Our project has demonstrated that it is possible to capture both an image of a specimen and a core database record in 35 seconds per herbarium sheet (for intervals between images of 30 minutes or less) plus some additional overhead for container-level data capture. This rate was in line with the pre-project expectations for our approach. Our throughput rates are comparable with some other similar, high-throughput approaches focused on digitizing herbarium sheets and is as much as three times faster than rates achieved with more conventional non-automated approaches used during the project. We report on challenges encountered during development and use of our system and discuss ways in which our workflow could be improved. The conveyor apparatus software, database schema, configuration files, hardware list, and conveyor schematics are available for download on GitHub.

机译：在自然科学系列中居住的数十亿个标本提供了一个巨大的利用资料来源，可用于科学研究，保护，商业和教育。标本数据和图像的数字化和动员有望大大加速他们的利用。虽然已经发生了天然科学收集标本的数字化数十年来，但绝大多数标本仍未取消数字化。如果要在不久的将来完成数字化任务，则需要创新的，高吞吐量方法。为了创建新英格兰全球变化研究的数据集，我们设计并实施了工业规模，用于植物标目典的基于输送机的数字化工作流程。工作流是对象到数据到数据工作流的变体，其优先化成像和存储容器级数据的捕获。工作流利用专门为成像扁平的植物标本标本的任务开发的新型输送系统。使用我们的工作流程，我们在131周的时间内成像并转录了近350,000个标本的标本级数据;存储容器级数据捕获需要额外的56周。我们的项目已经证明，可以在每个植物标记表（30分钟或更短的图像之间的间隔）加上35秒内捕获样本和核心数据库记录的图像，以及用于容器级数据捕获的一些额外开销。此费率符合项目前预期的方法。我们的吞吐率与其他类似的高通量方法相当，高通量方法集中在数字化植物标料上，并且比项目中使用的更多传统的非自动化方法实现的速率快三倍。我们报告了在我们的系统开发和使用过程中遇到的挑战，并讨论了我们的工作流程可以得到改善的方式。输送设备软件，数据库架构，配置文件，硬件列表和传送器原理图标可用于GitHub上下载。

著录项

来源
《Taxon》 |2018年第1期|共14页
作者
Sweeney Patrick W.; Starly Binil; Morris Paul J.; Xu Yiming; Jones Aimee; Radhakrishnan Sridhar; Grassa Christopher J.; Davis Charles C.;
展开▼
作者单位

Yale Univ Peabody Museum Nat Hist Div Bot POB 208118 New Haven CT 06520 USA;

North Carolina State Univ Edward P Fitts Dept Ind &

Syst Engn Raleigh NC 27607 USA;

Harvard Univ Museum Comparat Zool Cambridge MA 02138 USA;

Univ Oklahoma Sch Comp Sci Norman OK 73019 USA;

Univ Oklahoma Sch Ind &

Syst Engn Norman OK 73019 USA;

Univ Oklahoma Sch Comp Sci Norman OK 73019 USA;

Harvard Univ Harvard Univ Herbaria Cambridge MA 02138 USA;

Harvard Univ Harvard Univ Herbaria Cambridge MA 02138 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类植物分类学（系统植物学）;
关键词
automation; biodiversity informatics; digitization; herbarium specimens; imaging; New England; transcription; workflows;

机译：自动化;生物多样性信息学;数字化;植物标目标本;成像;新英格兰;转录;工作流程;

相似文献

外文文献
中文文献
专利

1. Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system [J] . Sweeney Patrick W., Starly Binil, Morris Paul J., Taxon . 2018,第1期

机译：植物标料标本的大规模数字化：自动化，高通量输送系统的开发和应用
2. The SALIX Method: A semi-automated workflow for herbarium specimen digitization [J] . Anne Barber, Daryl Lafferty, Leslie R. Landrum Taxon . 2013,第3期

机译：SALIX方法：植物标本数字化的半自动化工作流程
3. Computer vision applied to herbarium specimens of German trees: testing the future utility of the millions of herbarium specimen images for automated identification [J] . Jakob Unger, Dorit Merhof, Susanne Renner BMC Evolutionary Biology . 2016,第1期

机译：计算机视觉应用于德国树木的植物标本室标本：测试数百万个植物标本室标本图像在未来自动识别的实用性
4. Automated detection of prostate cancer in digitized whole-slide images of HE-stained biopsy specimens [C] . G. Litjens, B. Ehteshami Bejnordi, N. Timofeeva, Conference on digital pathology . 2015

机译：在H＆E染色的活检标本的数字化全幻灯片图像中自动检测前列腺癌
5. Herbarium infrastructure development and ecological applications of specimens using geographic information systems. [D] . Miller, Ryan Joseph. 2008

机译：使用地理信息系统的标本室基础设施发展和标本的生态应用。
6. LeafMachine: Using machine learning to automate leaf trait extraction from digitized herbarium specimens [O] . William N. Weaver, Julienne Ng, Robert G. Laport 2020

机译：LeafMachine：使用机器学习自动从数字化标本室标本中提取叶片性状
7. Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system [O] . Patrick W. Sweeney, Binil Starly, Paul J. Morris, 2018

机译：植物标料标本的大规模数字化：自动化，高通量输送系统的开发和用途

Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system

摘要

著录项

相似文献

相关主题

期刊订阅