首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >BIOPET: Towards Scalable, Maintainable, User-Friendly, Robust and Flexible NGS Data Analysis Pipelines
【24h】

BIOPET: Towards Scalable, Maintainable, User-Friendly, Robust and Flexible NGS Data Analysis Pipelines

机译:BIOPET:迈向可扩展,可维护,用户友好,健壮和灵活的NGS数据分析管道

获取原文

摘要

Because of the rapid decreasing of sequencing cost, more research and clinical institutes are generating Next Generation Sequencing data at an increasing and impressive scale. University Medical Centers in the Netherlands are sequencing thousands patients a year each as part of their routine diagnosis. On the research front, the GoNL project and BIOS project coordinated by the BBMRI-NL consortium have sequenced 770 whole genome DNA samples and over 4000 RNA samples collected from a number of Dutch biobanks. In 2016, the deployment of Illumina X Ten sequencer at the Hartwig Medical Foundation provides a sequencing capacity of 18,000 whole genome DNA samples per year. Processing these petabyte scale datasets requires revolutionary thinking and solutions in the computing and storage infrastructure and the data analysis pipelines. At Leiden University Medical Center, we have developed a GATK-Queue based open source pipeline framework - BIOPET (Bioinformatics Pipeline Execution Toolkit). We implemented all our commonly used NGS tools as Queue modules in the form of Scala classes. Together with those that are already supported in GATKQueue like GATK variant-calling and Picard tools, we have a full set of NGS tools at our disposal as Scala classes that are further combined into pipeline functions. Besides meeting the various standard requirements for NGS pipelines such as reentrancy, the BIOPET framework also offers a list of advanced features, such as live debugging, test and meta-analysis frameworks and easy deployment. BIOPET framework can run on various types of HPC infrastructure through its DRMAA support, e.g., SGE, SLURM, PBS.
机译:由于测序成本的快速下降,越来越多的研究和临床机构正在以越来越高的规模生成下一代测序数据。荷兰的大学医学中心每年对数千名患者进行测序,作为他们日常诊断的一部分。在研究方面,由BBMRI-NL财团协调的GoNL项目和BIOS项目已对770个全基因组DNA样本和从多个荷兰生物库中收集的4000多个RNA样本进行了测序。 2016年,Hartwig医学基金会部署了Illumina X 10测序仪,每年可提供18,000个全基因组DNA样品的测序能力。处理这些PB级数据集需要在计算和存储基础架构以及数据分析管道中进行革命性的思考和解决方案。在莱顿大学医学中心,我们开发了基于GATK-Queue的开源管道框架-BIOPET(生物信息学管道执行工具包)。我们以Scala类的形式将所有常用的NGS工具实现为Queue模块。与GATKQueue中已支持的那些工具(如GATK变体调用和Picard工具)一起,我们拥有全套的NGS工具,可作为Scala类使用,并进一步组合到管道功能中。除了满足NGS管道的各种标准要求(例如可重入性)外,BIOPET框架还提供了一系列高级功能,例如实时调试,测试和元分析框架以及易于部署的功能。 BIOPET框架可以通过其DRMAA支持(例如SGE,SLURM,PBS)在各种HPC基础架构上运行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号