首页> 外文会议>IEEE International Conference on E-Science >SELFIE: Self-Aware Information Extraction from Digitized Biocollections
【24h】

SELFIE: Self-Aware Information Extraction from Digitized Biocollections

机译:自拍照:自我意识信息提取数字化生物胶凝

获取原文

摘要

Biological collections store information with broad societal and environmental impact. In the last 15 years, after worldwide investments and crowdsourcing efforts, 25% of the collected specimens have been digitized; a process that includes the imaging of text attached to specimens and subsequent extraction of information from the resulting image. This information extraction (IE) process is complex, thus slow and typically involving human tasks. We propose a hybrid (Human-Machine) information extraction model that efficiently uses resources of different cost (machines, volunteers and/or experts) and speeds up the biocollections' digitization process, while striving to maintain the same quality as human-only IE processes. In the proposed model, called SELFIE, self-aware IE processes determine whether their output quality is satisfactory. If the quality is unsatisfactory, additional or alternative processes that yield higher quality output at higher cost are triggered. The effectiveness of this model is demonstrated by three SELFIE workflows for the extraction of Darwin-core terms from specimens' images. Compared to the traditional human-driven IE approach, SELFIE workflows showed, on average, a reduction of 27% in the information-capture time and a decrease of 32% in the required number of humans and their associated cost, while the quality of the results was negligibly reduced by 0.27%.
机译:生物收集商店信息具有广泛的社会和环境影响。在过去的15年里,在全球投资和众包的努力之后,25岁的收集标本已经数字化;包括附加到标本的文本成像的过程并随后从所得到的图像提取信息。此信息提取(即)过程复杂,因此慢速且通常涉及人类任务。我们提出了一种混合动力(人机)信息提取模型,可有效地使用不同成本(机器,志愿者和/或专家)的资源,并加快Biocollections的数字化过程,同时努力保持与人类的唯一质量,即工艺。在所提出的模型中,称为Selfie,自我感知IE进程确定它们的输出质量是否令人满意。如果质量不令人满意,触发了以更高成本产生更高质量的输出的附加或替代过程。该模型的有效性由三个自拍照工作流程用于从标本图像中提取达尔文核心术语。与传统的人类驱动的IE方法相比,Selfie工作流程平均显示在信息捕获时间中的27 %,在所需数量的人类及其相关成本中减少32 %,而质量结果忽略了0.27%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号