首页> 美国卫生研究院文献>Genes >NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements

【2h】

NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements

机译：NCBI的病毒发现黑客马拉松：与研究社区合作确定云基础架构要求

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.

机译：当可以提取大量病毒数据以创建病毒学研究社区的可用索引时，大量病毒数据尚未公开提供。我们假设这种复杂性和规模的工作可以在黑客马拉松的环境中完成。十个团队由来自六个国家的40多名参与者组成，在2019年1月9日开始的为期三天的圣地亚哥圣迭戈州立大学校园活动中，聚集在一起，创建了一套针对复杂生物数据集的众包分析和处理管道。 hackathon将来自国家生物技术信息中心（NCBI）序列读取档案（SRA）的141,676个宏基因组数据集由NCBI工作人员预先组装成连续的程序集（contig）。在黑客马拉松期间，选择了一个由2953个SRA数据集（大约5500万个重叠群）组成的子集，并对其进行了进一步过滤，最小长度为1 kb。这产生了420万（Mio）重叠群，使用BLAST对所有已知病毒基因组进行了比对，系统发生了聚类并分配了元数据。在4.2个Mio重叠群中，有360,000个重叠群用域标记，另外一个包含4400个重叠群的子集被筛选出病毒或类病毒基因。该工作对支持此类工作所需的SRA数据和云基础架构产生了宝贵的见解，揭示了分析瓶颈及其可能的解决方法。主要是：（i）SRA数据的保守组合改进了初始分析步骤；（ii）可以通过包装脚本提升现有的具有弱多线程/多核支持的生物信息软件，以使用计算节点内的所有核；（iii）重新设计用于云基础架构的现有生物信息算法，以促进更广泛的受众使用它；（iv）云基础架构使各种各样的研究人员可以有效地协作。科学发现将在后续活动中扩展。在这里，我们介绍了应用的工作流程，初步结果以及从黑客马拉松中学到的教训。

著录项

期刊名称 Genes
作者
Ryan Connor; Rodney Brister; Jan P. Buchmann; Ward Deboutte; Rob Edwards; Joan Martí-Carreras; Mike Tisza; Vadim Zalunin; Juan Andrade-Martínez; Adrian Cantu; Michael D’Amour; Alexandre Efremov; Lydia Fleischmann; Laura Forero-Junco; Sanzhima Garmaeva; Melissa Giluso; Cody Glickman; Margaret Henderson; Benjamin Kellman; David Kristensen; Carl Leubsdorf; Kyle Levi; Shane Levi; Suman Pakala; Vikas Peddu; Alise Ponsero; Eldred Ribeiro; Farrah Roy; Lindsay Rutter; Surya Saha; Migun Shakya; Ryan Shean; Matthew Miller; Benjamin Tully; Christopher Turkington; Ken Youens-Clark; Bert Vanmechelen; Ben Busby;
展开▼
作者单位

展开▼
年(卷),期 2019(10),9
年度 2019
页码 714
总页数 18
原文格式 PDF
正文语种
中图分类生化遗传学;生化药理学;
关键词
metagenomic viruses SRA STRIDES hackathon infrastructure cloud computing;

机译：元基因组;病毒;SRA;STRIDES;hackathon;基础架构;云计算;

相似文献

外文文献
中文文献
专利

1. NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements [J] . Ryan Connor, Rodney Brister, Jan P. Buchmann, Genes . 2019,第9期

机译：NCBI的病毒发现Hackathon：吸引研究社区来识别云基础设施要求
2. A Dynamic Cloud Discovery Framework for Deploying of Scientific Computing Services over a Multi-cloud Infrastructure [J] . C.D. Karthic, S. Sujatha, V. Praveenkumar Journal of Artificial Intelligence . 2012,第4期

机译：用于在多云基础架构上部署科学计算服务的动态云发现框架
3. Identifying community priorities for neighborhood livability: Engaging neighborhood residents to facilitate community assessment [J] . Reyes David, Meyer Karen Public health nursing . 2020,第1期

机译：确定社区居民的社区优先事项：从事邻里居民以促进社区评估
4. Process Discovery from Event Stream Data in the Cloud - A Scalable, Distributed Implementation of the Flexible Heuristics Miner on the Amazon Kinesis Cloud Infrastructure [C] . Joerg Evermann, Jana-Rebecca Rehse, Peter Fettke IEEE International Conference on Cloud Computing Technology and Science . 2016

机译：从云中的事件流数据进行流程发现-Amazon Kinesis Cloud基础设施上的灵活启发式矿工的可扩展，分布式实现
5. Discovery and Sequencing of Novel and Identified Mosquito-Associated Viruses and Genetic Determinants of Flavivirus Host Specificity [D] . Charles, Jermilia. 2019

机译：小说的发现和排序和鉴定的蚊子相关病毒和黄病毒宿主特异性的遗传决定因素
6. Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects [O] . Daniele D'Agostino, Andrea Clematis, Alfonso Quarati, 2006

机译：用于计算机模拟药物发现的云基础架构：经济和实践方面
7. NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements [O] . Ryan Connor, Rodney Brister, Jan Buchmann, 2019

机译：NCBI的病毒发现Hackathon：吸引研究社区来识别云基础设施要求
8. Intranet, Internet, and Cloud Computing: Identifying Weak Spots in Our Technological Infrastructure [R] . Bingue, E. W., Cook, D. A. 2011

机译：内联网，互联网和云计算：识别我们技术基础设施中的弱点

NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements

摘要

著录项

相似文献

相关主题

期刊订阅