首页> 外文期刊>Journal of biomolecular techniques :JBT. >Nucleotide-Level Variant Analysis of Next-Generation Sequencing Data Using a Cloud-Based Data Analysis Pipeline
【24h】

Nucleotide-Level Variant Analysis of Next-Generation Sequencing Data Using a Cloud-Based Data Analysis Pipeline

机译:使用基于云的数据分析管道对下一代测序数据进行核苷酸水平的变异分析

获取原文
           

摘要

To demonstrate the flexibility of a cloud-based solution for analyzing disparate sets of next-generation sequencing data, we looked at carefully chosen samples across different populations from the 1,000 Genomes Project ( www.1000genomes.org ) and conducted an extensive analysis on two Chinese populations, the “Chinese in Beijing” (CHB) and the “Chinese in metropolitan Denver” (CHD), each consisting of 28 exomes. Each dataset was uploaded into the system using raw data files acquired from the 1,000 Genomes Project. Using these data and a cloud-based data analysis pipeline, we performed a nucleotide-level variant analysis combined with a population allele frequency analysis across all samples for the two populations. To identify alleles that are significantly different across the two populations, a Pearson's chi-square test was applied, which resulted in a total of 1.5 Mio SNPs, of which 84 were non-synonymous with a p-value of less than 0.01. Interestingly, the genes associated with non-synonymous variants of the Chinese in metropolitan Denver population were enriched for biological annotations such as endocrine system disorder, metabolic disease, cardiac fibrosis, and inflammation (includes ZNF264, RPS6KA2, ROBO2, CRK, MUSK, CBL, CRK, and others). Furthermore, genes usually associated with liver injury were also identified for this population, suggesting the liver is exposed to toxic agents more so in this population compared to the CHB population. The observed genomic differences in these two different Chinese populations living in different parts of the world hint towards a potential link between nutrition and different diseases (e.g. heart disease or metabolic diseases). Using this analysis as a case study, we will demonstrate how a scalable computational infrastructure can provide researchers and sequencing service providers alike, a cost effective and secure cloud-based computing platform as a powerful and collaborative technology solution for large scale sequence data analysis and management.
机译:为了展示基于云的解决方案来分析不同的下一代测序数据的灵活性,我们研究了1000个基因组计划(www.1000genomes.org)中跨不同人群精心挑选的样本,并对两种中文进行了广泛的分析。人口中,“北京华人”(CHB)和“大都会丹佛华人”(CHD)分别由28个外显子组组成。每个数据集都使用从1000个基因组计划中获得的原始数据文件上传到系统中。使用这些数据和基于云的数据分析管道,我们对两个种群的所有样本进行了核苷酸水平变异分析和种群等位基因频率分析。为了鉴定两个群体之间显着不同的等位基因,应用了Pearson的卡方检验,该检验共产生1.5个Mio SNP,其中84个是非同义词,p值小于0.01。有趣的是,大都市丹佛人口中与中国人同义异体变体相关的基因被丰富用于生物学注释,例如内分泌系统紊乱,代谢性疾病,心脏纤维化和炎症(包括ZNF264,RPS6KA2,ROBO2,CRK,MUSK,CBL, CRK等)。此外,该人群中还发现了通常与肝损伤相关的基因,这表明与CHB人群相比,该人群中肝脏暴露于有毒物质的可能性更大。在生活在世界不同地区的这两个不同的中国人口中观察到的基因组差异暗示了营养与不同疾病(例如心脏病或代谢性疾病)之间的潜在联系。使用此分析作为案例研究,我们将演示可扩展的计算基础架构如何为研究人员和测序服务提供商提供相同的,经济高效且安全的基于云的计算平台,作为用于大规模序列数据分析和管理的强大且协作的技术解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号