...
首页> 外文期刊>BMC Genomics >GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service
【24h】

GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service

机译:GT-WGS:基于AWS云服务进行大规模WGS分析的高效经济工具

获取原文
           

摘要

Whole-genome sequencing (WGS) plays an increasingly important role in clinical practice and public health. Due to the big data size, WGS data analysis is usually compute-intensive and IO-intensive. Currently it usually takes 30 to 40?h to finish a 50× WGS analysis task, which is far from the ideal speed required by the industry. Furthermore, the high-end infrastructure required by WGS computing is costly in terms of time and money. In this paper, we aim to improve the time efficiency of WGS analysis and minimize the cost by elastic cloud computing. We developed a distributed system, GT-WGS, for large-scale WGS analyses utilizing the Amazon Web Services (AWS). Our system won the first prize on the Wind and Cloud challenge held by Genomics and Cloud Technology Alliance conference (GCTA) committee. The system makes full use of the dynamic pricing mechanism of AWS. We evaluate the performance of GT-WGS with a 55× WGS dataset (400GB fastq) provided by the GCTA 2017 competition. In the best case, it only took 18.4?min to finish the analysis and the AWS cost of the whole process is only 16.5 US dollars. The accuracy of GT-WGS is 99.9% consistent with that of the Genome Analysis Toolkit (GATK) best practice. We also evaluated the performance of GT-WGS performance on a real-world dataset provided by the XiangYa hospital, which consists of 5× whole-genome dataset with 500 samples, and on average GT-WGS managed to finish one 5× WGS analysis task in 2.4?min at a cost of $3.6. WGS is already playing an important role in guiding therapeutic intervention. However, its application is limited by the time cost and computing cost. GT-WGS excelled as an efficient and affordable WGS analyses tool to address this problem. The demo video and supplementary materials of GT-WGS can be accessed at https://github.com/Genetalks/wgs_analysis_demo .
机译:全基因组测序(WGS)在临床实践和公共卫生中发挥着越来越重要的作用。由于数据量大,WGS数据分析通常是计算密集型和IO密集型的。目前,完成50倍WGS分析任务通常需要30到40?h,这远非行业所需的理想速度。此外,WGS计算所需的高端基础架构在时间和金钱上都是昂贵的。本文旨在通过弹性云计算来提高WGS分析的时间效率并最大程度地降低成本。我们开发了一个分布式系统GT-WGS,用于使用Amazon Web Services(AWS)进行大规模WGS分析。我们的系统获得了由基因组学和云技术联盟会议(GCTA)委员会举办的“风与云挑战”一等奖。该系统充分利用了AWS的动态定价机制。我们使用GCTA 2017竞赛提供的55倍WGS数据集(400GB fastq)评估GT-WGS的性能。在最佳情况下,仅需18.4分钟即可完成分析,整个过程的AWS成本仅为16.5美元。 GT-WGS的准确性为99.9%,与基因组分析工具包(GATK)最佳实践一致。我们还评估了湘雅医院提供的真实数据集的GT-WGS性能,该数据集由5个全基因组数据集和500个样本组成,平均而言GT-WGS可以完成一项5X WGS分析任务时间为2.4分钟,费用为3.6美元。 WGS已经在指导治疗干预方面发挥了重要作用。但是,其应用受到时间成本和计算成本的限制。 GT-WGS作为解决此问题的有效且负担得起的WGS分析工具而著称。可以在https://github.com/Genetalks/wgs_analysis_demo上访问GT-WGS的演示视频和补充材料。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号