首页> 美国卫生研究院文献>other >Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2535 Human Genomes
【2h】

Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2535 Human Genomes

机译:廉价且高度可复制的基于云的2535个人类基因组变异调用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future.
机译:整个人类基因组的人口规模测序在经济上变得可行。但是,数据管理和分析对于许多研究小组仍然是一个巨大的挑战。大型测序研究,例如“ 1000基因组计划”,增进了我们对人类人口统计学的了解以及疾病中罕见的遗传变异的影响。鉴于变异调用管道的众多组成部分,对成百上千个基因组的数据集进行变异调用非常耗时,昂贵且不容易重现。在这里,我们描述了一个基于云的管道,用于使用实时基因组种群调用程序在大样本中进行联合变量调用。我们使用DNAnexus平台在亚马逊云上部署了人口呼叫者,以实现低成本的变异呼叫。使用我们的开发流程,我们能够从1000个基因组计划的第3阶段中的2,535个样本中识别出6,830万个变体。通过并行执行变量调用,数据在5天内得到处理,每个样本的计算成本为7.33美元(已完成工作的总成本为18,590美元,所有工作的总成本为21,805美元)。对数据大小的成本依赖性和运行时间的分析表明,考虑到近乎线性的可扩展性,云计算可以成为一种便宜且高效的平台,用于将来分析更大的测序研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号