首页> 外文期刊>GigaScience >Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
【24h】

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

机译:Bio-Docklets:用于一步执行NGS管道的虚拟化容器

获取原文
           

摘要

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.
机译:下一代测序(NGS)数据的处理除了专门的分析后可视化和数据挖掘软件外,还需要重要的技术技能,包括生物信息学数据管道的安装,配置和执行。为了应对这些挑战中的某些挑战,开发人员已利用虚拟化容器在任何计算平台上无缝部署预配置的生物信息学软件和管道。我们提出了一种抽象方法,用于NGS数据分析的多步骤生物信息学管道的复杂数据操作。作为示例,我们已经部署了2条用于RNA测序和染色质免疫沉淀测序的管道,这些管道在Docker虚拟化容器(称为Bio-Docklets)中进行了预配置。每个Bio-Docklet都公开了一个数据输入和输出端点,并且从用户的角度出发,运行管道就像运行单个生物信息学工具一样简单。这可以通过“元脚本”来实现,该脚本可自动启动Bio-Docklets并通过BioBlend软件库和Galaxy应用程序编程接口控制管道执行。通过与Visual Omics Explorer框架集成来对管道输出进行后处理,从而提供用户可以通过Web浏览器访问的交互式数据可视化。我们的目标是使非生物信息学专家在任何计算环境(实验室工作站,大学计算机集群或云服务提供商)上均可轻松访问NGS数据分析管道。除了最终用户之外,Bio-Docklets还使开发人员能够以编程方式部署和运行大量管道实例,以便同时分析多个数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号