首页> 外文期刊>Taxon >Semi-automated workflows for acquiring specimen data from label images in herbarium collections
【24h】

Semi-automated workflows for acquiring specimen data from label images in herbarium collections

机译:半自动化工作流程,用于从植物标本馆藏的标签图像中获取标本数据

获取原文
获取原文并翻译 | 示例
           

摘要

Computational workflow environments are an active area of computer science and informatics research; they promise to be effective for automating biological information processing for increasing research efficiency and impact. In this project, semi-automated data processing workflows were developed to test the efficiency of computerizing information contained in herbarium plant specimen labels. Our test sample consisted of Mexican and Central American plant specimens held in the University of Michigan Herbarium (MICH). The initial data acquisition process consisted of two parts: (1) the capture of digital images of specimen labels and of full-specimen herbarium sheets, and (2) creation of a minimal field database, or "pre-catalog", of records that contain only information necessary to uniquely identify specimens. For entering "pre-catalog" data, two methods were tested: key-stroking the information (a) from the specimen labels directly, or (b) from digital images of specimen labels. In a second step, locality and latitude/longitude data fields were filled in if the values were present on the labels or images. If values were not available, geo-coordinates were assigned based on further analysis of the descriptive locality information on the label. Time and effort for the various steps were measured and recorded. Our analysis demonstrates a clear efficiency benefit of articulating a biological specimen data acquisition workflow into discrete steps, which in turn could be individually optimized. First, we separated the step of capturing data from the specimen from most keystroke data entry tasks. We did this by capturing a digital image of the specimen for the first step, and also by limiting initial key-stroking of data to create only a minimal "pre-catalog" database for the latter tasks. By doing this, specimen handling logistics were streamlined to minimize staff time and cost. Second, by then obtaining most of the specimen data from the label images, the more intellectually challenging task of label data interpretation could be moved electronically out of the herbarium to the location of more highly trained specialists for greater efficiency and accuracy. This project used experts in the plants' country of origin. Mexico, to verify localities, geography, and to derive geo-coordinates. Third, with careful choice of data fields for the "pre-catalog" database, specimen image files linked to the minimal tracking records could be sorted by collector and date of collection to minimize key-stroking of redundant data in a continuous series of labels, resulting in improved data entry efficiency and data quality.
机译:计算工作流环境是计算机科学和信息学研究的活跃领域。他们承诺将有效地实现生物信息处理的自动化,从而提高研究效率和影响力。在该项目中,开发了半自动化的数据处理工作流程,以测试将植物标本室植物标本标签中包含的信息计算机化的效率。我们的测试样品由密歇根大学植物标本室(MICH)持有的墨西哥和中美洲植物标本组成。初始数据获取过程包括两个部分:(1)捕获标本标签和全标本植物标本室的数字图像,以及(2)创建一个最小现场数据库或“预编目”记录仅包含唯一识别标本所需的信息。为了输入“样本前”数据,测试了两种方法:从(a)直接从标本标签中敲击信息,或(b)从标本标签的数字图像中敲击信息。第二步,如果标签或图像上存在值,则填写位置和纬度/经度数据字段。如果值不可用,则根据对标签上描述性位置信息的进一步分析来分配地理坐标。测量并记录了各个步骤的时间和精力。我们的分析表明,将生物标本数据采集工作流程分为多个离散步骤可以明显提高效率,而这些步骤又可以单独进行优化。首先,我们从大多数击键数据输入任务中分离了从标本中捕获数据的步骤。为此,我们通过第一步获取标本的数字图像,并通过限制数据的初始键击来创建仅用于该任务的最小“目录前”数据库。这样,简化了标本处理后勤流程,以最大程度地减少人员时间和成本。其次,通过从标签图像中获取大部分标本数据,可以将电子方式将标本数据解释中更具智力挑战的任务从植物标本室移至训练有素的专业人员的位置,以提高效率和准确性。该项目使用了植物来源国的专家。墨西哥,以验证位置,地理位置并导出地理坐标。第三,通过仔细选择“目录前”数据库的数据字段,可以按收集者和收集日期对链接到最少跟踪记录的标本图像文件进行排序,以最大程度地减少连续标签序列中冗余数据的关键记录,从而提高了数据输入效率和数据质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号