首页> 美国卫生研究院文献>other >The Dockstore: enabling modular community-focused sharing of Docker-based genomics tools and workflows
【2h】

The Dockstore: enabling modular community-focused sharing of Docker-based genomics tools and workflows

机译:Dockstore:实现基于社区的模块化基于社区的共享基于Docker的基因组学工具和工作流程

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( ), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).
机译:随着基因组数据集的不断增长,将数据下载到本地组织并在传统计算环境上运行分析的可行性变得越来越成问题。当前的大型项目,例如ICGC全基因组泛癌分析(PCAWG),美国精密医学计划的数据平台和NIH大数据转化基因组学知识中心,都在使用基于云的基础架构来托管并对大型数据集进行分析。在PCAWG中,在14个云和HPC环境中对超过5,800个整个人类基因组进行了比对和命名。然后,已处理的数据将在云上提供以进行进一步的分析和共享。如果在本地运行,那么大规模的运营将垄断一个典型的学术数据中心许多个月,并且将对数据存储和分发提出重大挑战。但是,这种规模对于基因组学项目而言越来越典型,因此有必要重新考虑如何将分析工具打包并转移到数据中。对于PCAWG,我们支持使用高度可移植的Docker映像在高度可变的环境中封装和共享复杂的对齐和变体调用工作流。虽然成功,但这项努力揭示了Docker容器的局限性,即缺乏描述和执行封装在容器内的工具的标准化方法。结果,我们创建了Dockstore(),该项目将Docker映像与描述和运行其中包含的工具的标准化的,机器可读的方式组合在一起。通过全球基因组与健康联盟(GA4GH)制定的新兴Web服务标准,该服务极大地改善了基因组学工具的共享和重用,并促进了与类似项目的互操作性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号