首页> 美国卫生研究院文献>other >Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets
【2h】

Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets

机译:对非专用云计算提供商进行基准测试以分析基因组数据集

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5–78.2) for E.coli and 53.5% (95% CI: 34.4–72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5–303.1) and 173.9% (95% CI: 134.6–213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.
机译:生物发现的一个主要瓶颈正在计算层面出现。云计算提供了一种动态手段,中小型实验室可以通过该手段快速调整其计算能力。我们使用公开可用的基因组数据集(大肠杆菌CC102菌株和汉族男性基因组)和标准生物信息学,对两个已建立的云计算服务(在Amazon EC2实例上的Amazon Web Services Elastic MapReduce(EMR)和Google Compute Engine(GCE))进行了基准测试基于Hadoop的平台上的管道。对于人类基因组,完整组装的壁钟时间相差52.9%(95%CI:27.5–78.2),对于人类基因组而言,相差53.5%(95%CI:34.4–72.6),GCE比EMR更有效。在EMR和GCE上进行此实验的费用差异很大,对于E.coli和人类而言,EMR的费用分别高257.3%(95%CI:211.5–303.1)和173.9%(95%CI:134.6–213.1)组件分别。因此,在成本和挂钟时间方面,GCE的表现均优于EMR。我们的发现证实,云计算是分析大型基因组数据集的一种有效且具有潜在成本效益的选择。除了发布成本效益比较之外,我们还提供了可用的现成脚本,用于通过在EC2或GCE上进行Ganglia监视来建立Hadoop实例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号