...
首页> 外文期刊>BMC Bioinformatics >Impact of concurrency on the performance of a whole exome sequencing pipeline
【24h】

Impact of concurrency on the performance of a whole exome sequencing pipeline

机译:并发性对整个Exome测序管道性能的影响

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Current high-throughput technologies—i.e. whole genome sequencing, RNA-Seq, ChIP-Seq, etc.—generate huge amounts of data and their usage gets more widespread with each passing year. Complex analysis pipelines involving several computationally-intensive steps have to be applied on an increasing number of samples. Workflow management systems allow parallelization and a more efficient usage of computational power. Nevertheless, this mostly happens by assigning the available cores to a single or few samples’ pipeline at a time. We refer to this approach as naive parallel strategy (NPS). Here, we discuss an alternative approach, which we refer to as concurrent execution strategy (CES), which equally distributes the available processors across every sample’s pipeline. Theoretically, we show that the CES results, under loose conditions, in a substantial speedup, with an ideal gain range spanning from 1 to the number of samples. Also, we observe that the CES yields even faster executions since parallelly computable tasks scale sub-linearly. Practically, we tested both strategies on a whole exome sequencing pipeline applied to three publicly available matched tumour-normal sample pairs of gastrointestinal stromal tumour. The CES achieved speedups in latency up to 2–2.4 compared to the NPS. Our results hint that if resources distribution is further tailored to fit specific situations, an even greater gain in performance of multiple samples pipelines execution could be achieved. For this to be feasible, a benchmarking of the tools included in the pipeline would be necessary. It is our opinion these benchmarks should be consistently performed by the tools’ developers. Finally, these results suggest that concurrent strategies might also lead to energy and cost savings by making feasible the usage of low power machine clusters.
机译:目前的高吞吐量技术-i.e。全基因组测序,RNA-SEQ,CHIP-SEQ等 - 产生大量数据,其使用与每年通过的年份更广泛。涉及几个计算密集型步骤的复杂分析管道必须应用于越来越多的样品。工作流管理系统允许并行化和更有效的计算能力使用。尽管如此,这主要通过将可用的核心分配给单个或少数样本的管道。我们将这种方法称为天真并行策略(NPS)。在这里,我们讨论了一种替代方法,我们将其称为并发执行策略(CES),其同样地分配了每个样本的管道中的可用处理器。理论上,我们表明CES在松散条件下,在大量的加速下,具有从1到样品数量的理想增益范围。此外,我们观察到CES甚至产生更快的执行,因为并行可计算的任务缩放了亚线性。实际上,我们测试了在整个exome测序管道上的策略,适用于三个公共可用的匹配肿瘤正常样品对的胃肠道间质瘤。与NPS相比,CES在高达2-2.4的延迟中实现了加速。我们的结果提示如果资源分布进一步定制以适应特定情况,则可以实现多个样本执行的更高的性能增益。为此,需要在管道中包含的工具的基准。我们认为这些基准应该由工具的开发人员始终如一地执行这些基准。最后,这些结果表明,并发策略也可能通过可行的使用低电机集群来实现能源和成本节约。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号