BIOPET: Towards Scalable, Maintainable, User-Friendly, Robust and Flexible NGS Data Analysis Pipelines

机译：BIOPET：迈向可扩展，可维护，用户友好，健壮和灵活的NGS数据分析管道

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Because of the rapid decreasing of sequencing cost, more research and clinical institutes are generating Next Generation Sequencing data at an increasing and impressive scale. University Medical Centers in the Netherlands are sequencing thousands patients a year each as part of their routine diagnosis. On the research front, the GoNL project and BIOS project coordinated by the BBMRI-NL consortium have sequenced 770 whole genome DNA samples and over 4000 RNA samples collected from a number of Dutch biobanks. In 2016, the deployment of Illumina X Ten sequencer at the Hartwig Medical Foundation provides a sequencing capacity of 18,000 whole genome DNA samples per year. Processing these petabyte scale datasets requires revolutionary thinking and solutions in the computing and storage infrastructure and the data analysis pipelines. At Leiden University Medical Center, we have developed a GATK-Queue based open source pipeline framework - BIOPET (Bioinformatics Pipeline Execution Toolkit). We implemented all our commonly used NGS tools as Queue modules in the form of Scala classes. Together with those that are already supported in GATKQueue like GATK variant-calling and Picard tools, we have a full set of NGS tools at our disposal as Scala classes that are further combined into pipeline functions. Besides meeting the various standard requirements for NGS pipelines such as reentrancy, the BIOPET framework also offers a list of advanced features, such as live debugging, test and meta-analysis frameworks and easy deployment. BIOPET framework can run on various types of HPC infrastructure through its DRMAA support, e.g., SGE, SLURM, PBS.

机译：由于测序成本的快速下降，越来越多的研究和临床机构正在以越来越高的规模生成下一代测序数据。荷兰的大学医学中心每年对数千名患者进行测序，作为他们日常诊断的一部分。在研究方面，由BBMRI-NL财团协调的GoNL项目和BIOS项目已对770个全基因组DNA样本和从多个荷兰生物库中收集的4000多个RNA样本进行了测序。 2016年，Hartwig医学基金会部署了Illumina X 10测序仪，每年可提供18,000个全基因组DNA样品的测序能力。处理这些PB级数据集需要在计算和存储基础架构以及数据分析管道中进行革命性的思考和解决方案。在莱顿大学医学中心，我们开发了基于GATK-Queue的开源管道框架-BIOPET（生物信息学管道执行工具包）。我们以Scala类的形式将所有常用的NGS工具实现为Queue模块。与GATKQueue中已支持的那些工具（如GATK变体调用和Picard工具）一起，我们拥有全套的NGS工具，可作为Scala类使用，并进一步组合到管道功能中。除了满足NGS管道的各种标准要求（例如可重入性）外，BIOPET框架还提供了一系列高级功能，例如实时调试，测试和元分析框架以及易于部署的功能。 BIOPET框架可以通过其DRMAA支持（例如SGE，SLURM，PBS）在各种HPC基础架构上运行。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2017年|823-829|共7页
会议地点
作者
Peter Vant Hof; Wibowo Arindrarto; Sander Bollen; Szymon Kielbasa; Jeroen Laros; Hailiang Mei;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Grid computing;

机译：网格计算;

相似文献

外文文献
中文文献
专利

1. LLAMA: a robust and scalable machine learning pipeline for analysis of large scale 4D microscopy data: analysis of cell ruffles and filopodia [J] . Lefevre James G., Koh Yvette W. H., Wall Adam A., BMC Bioinformatics . 2021,第1期

机译：骆驼：一种稳健且可扩展的机器学习管道，用于分析大规模4D显微镜数据：细胞褶边和氟化绦虫的分析
2. WinProphet: A User-Friendly Pipeline Management System for Proteomics Data Analysis Based on Trans-Proteomic Pipeline [J] . Chen Ching-Tai, Ko Chu-Ling, Choong Wai-Kok, Analytical chemistry . 2019,第15期

机译：WinProphet：基于Trans-Protemomic管道的蛋白质组学数据分析的用户友好的管道管理系统
3. ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data [J] . Zixia Huang, Emma C. Teeling BMC Genomics . 2017,第1期

机译：ExUTR：从NGS数据大规模预测3'-UTR序列的新型流水线
4. BIOPET: towards Scalable, Maintainable, User-friendly, Robust and Flexible NGS data analysis pipelines [C] . Peter vant Hof, Wibowo Arindrarto, Sander Bollen, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2017

机译：BioPet：朝着可扩展，可维护，用户友好，强大而灵活的NGS数据分析管道
5. A Robust scRNA-seq Data Analysis Pipeline for Measuring Gene Expression Noise. [D] . Balachandran, Parithi. 2017

机译：用于测量基因表达噪声的强大的scRNA-seq数据分析管道。
6. LLAMA: a robust and scalable machine learning pipeline for analysis of large scale 4D microscopy data: analysis of cell ruffles and filopodia [O] . James G. Lefevre, Yvette W. H. Koh, Adam A. Wall, 2021

机译：Llama：一种坚固且可扩展的机器学习管道用于分析大规模4D显微镜数据：细胞褶边和氟覆的分析
7. Cascabel: a flexible, scalable and easy-to-use amplicon sequence data analysis pipeline [O] . Alejandro Abdala Asbun, Marc A Besseling, Sergio Balzano, 2019

机译：Cascabel：灵活，可扩展且易于使用的扩增子序列数据分析管道

BIOPET: Towards Scalable, Maintainable, User-Friendly, Robust and Flexible NGS Data Analysis Pipelines

摘要

著录项

相似文献

相关主题

期刊订阅