首页> 美国卫生研究院文献>PeerJ Computer Science >20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic cross-institutional collaboration

【2h】

20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic cross-institutional collaboration

机译：20 GB在10分钟内：使用公开的社会技术基础设施和务实的跨机构协作将主要生物多样性数据库联系起来

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked using taxonomic names. This labor intensive, error prone, and lengthy process relies on accessible versions of nomenclatural authorities and fuzzy-matching algorithms. To approach the challenge of linking diverse data, more than technology is needed. New social collaborations like the Global Unified Open Data Architecture (GUODA) that combines skills from diverse groups of computer engineers from iDigBio, server resources from the Advanced Computing and Information Systems (ACIS) Lab, global-scale data presentation from EOL, and independent developers and researchers are what is needed to make concrete progress on finding relationships between biodiversity datasets. This paper will discuss a technical solution developed by the GUODA collaboration for faster linking across databases with a use case linking Wikidata and the Global Biotic Interactions database (GloBI). The GUODA infrastructure is a 12-node, high performance computing cluster made up of about 192 threads with 12 TB of storage and 288 GB memory. Using GUODA, 20 GB of compressed JSON from Wikidata was processed and linked to GloBI in about 10–11 min. Instead of comparing name strings or relying on a single identifier, Wikidata and GloBI were linked by comparing graphs of biodiversity identifiers external to each system. This method resulted in adding 119,957 Wikidata links in GloBI, an increase of 13.7% of all outgoing name links in GloBI. Wikidata and GloBI were compared to Open Tree of Life Reference Taxonomy to examine consistency and coverage. The process of parsing Wikidata, Open Tree of Life Reference Taxonomy and GloBI archives and calculating consistency metrics was done in minutes on the GUODA platform. As a model collaboration, GUODA has the potential to revolutionize biodiversity science by bringing diverse technically minded people together with high performance computing resources that are accessible from a laptop or desktop. However, participating in such a collaboration still requires basic programming skills.

机译：生物多样性信息通过许多数据库提供，每个数据库都有自己的数据模型，Web服务和数据类型。组合数据库的数据会导致新的见解，但并不容易，因为每个数据库都使用自己的标识符系统。在没有稳定和可互操作的标识符的情况下，数据库通常使用分类名称链接。这种劳动密集型，容易出错，冗长的过程依赖于名称应对权限和模糊匹配算法的可访问版本。要接近链接多样化数据的挑战，需要超过技术。像全球统一开放数据架构（Guoda）这样的新社会合作，将来自Digbio的不同组计算机工程师，服务器资源从高级计算和信息系统（ACIS）实验室，来自EOL和独立开发人员的全球范围数据演示文稿和研究人员是在生物多样性数据集之间找到关系的具体进展所需的。本文将讨论由郭达协作开发的技术解决方案，以便在具有使用案例链接Wikidata和全球生物互动数据库（Globi）的使用情况更快地链接数据库。 Guoda Infrastructure是一个12节点，高性能计算集群，由大约192个线程组成，具有12 TB存储和288 GB内存。使用Guoda，来自Wikidata的20 GB压缩JSON被处理并在大约10-11分钟内与Globi联系起来。通过比较每个系统外部外部的生物多样性标识符的图表，而不是比较名称字符串或依赖于单个标识符，Wikidata和Globi链接。该方法导致Globi中添加119,957个Wikidata链接，增加了Globi中所有传出名称链接的13.7％。将Wikidata和Globi进行了比较，以打开生命参考分类树，以检查一致性和覆盖范围。解析Wikidata的过程，在Guoda平台上几分钟内完成了vikidata，打开生命参考树分类和Globi档案和计算一致性指标。作为模范协作，郭多有可能通过将多样化的技术良好的人与笔记本电脑或桌面可访问的高性能计算资源一起培养多样化的技术态度，拓展生物多样性科学。但是，参与这种协作仍需要基本的编程技巧。

著录项

期刊名称 PeerJ Computer Science
作者
Anne E. Thessen; Jorrit H. Poelen; Matthew Collins; Jen Hammock;
展开▼
作者单位

展开▼
年(卷),期 2018(-1),-1
年度 2018
页码 -1
总页数 15
原文格式 PDF
正文语种
中图分类
关键词

机译：生物多样性;合作;标识符;wikidata;图;链接;
入库时间 2022-08-21 12:36:11

相似文献

外文文献
专利

1. 20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration [J] . Anne E. Thessen, Jorrit H. Poelen, Matthew Collins, PeerJ Computer Science . 2018,第5期

机译：10分钟内达到20 GB：使用开放的社会技术基础设施和务实的跨机构合作将主要生物多样性数据库链接在一起的案例
2. 10-/28-Gb Chirp Managed 20-km Links Based on Silicon Photonics Transceivers [J] . A. Abbasi, B. Moeneclaey, X. Yin, IEEE Photonics Technology Letters . 2017,第16期

机译：基于硅光子收发器的10- / 28-Gb Chirp托管20公里链路
3. Fluid Bolus over 15-20 minutes vs. 5-10 minutes each in the First Hour of Resuscitation in Children with Septic Shock - a Randomized Controlled Trial [J] . Sankar J., Meena R., Ismail J., European journal of pediatrics . 2016,第11期

机译：脓毒性休克儿童复苏后的第一个小时内，在15-20分钟内输液，而在复苏的第一小时内分别输液5-10分钟-一项随机对照试验
4. 10 Gb/s on6; 20 Gb/s extended-reach multimode-fiber datacommunication links using multilevel modulation and transmitter-based equalization [C] . Ingham, J.D., Penty, Ph.D. Research in Microelectronics and Electronics, 2005 . 2008

机译：10 Gb / s on6;使用多级调制和基于发射机的均衡功能的20 Gb / s扩展范围多模光纤数据通信链路
5. A compact electronic dispersion compensation solution for 10Gb/s optical links. [D] . Hagman, Matthew. 2009

机译：用于10Gb / s光链路的紧凑型电子色散补偿解决方案。
6. Who learns from whom? Supporting users and developers of a major biodiversity e-infrastructure [O] . Irina Brake, Daphne Duin, Isabella Van de Velde, 2011

机译：谁向谁学习？支持主要生物多样性电子基础设施的用户和开发商
7. 20 GB in 10 minutes: A case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration [O] . Anne E Thessen, Jorrit H Poelen, Matthew Collins, 2018

机译：20 GB在10分钟内：使用公开的社会技术基础设施和务实的跨机构协作将主要生物多样性数据库联系起来
8. Estimating Lethal and Severe Toxic Effects in Minipigs Following 10, 60, and 180 Minutes of Whole-Body GB Vapor Exposure [R] . Hulet, S. W. , Sommerville, D. R. , Jakubowski, E. M. , 2006

机译：在10,60和180分钟的全身GB蒸气暴露后评估小型猪的致死和严重毒性效应

20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic cross-institutional collaboration

摘要

著录项

相似文献

相关主题

期刊订阅