首页> 美国卫生研究院文献>Journal of Biomolecular Techniques : JBT >Nucleotide-Level Variant Analysis of Next-Generation Sequencing Data Using a Cloud-Based Data Analysis Pipeline
【2h】

Nucleotide-Level Variant Analysis of Next-Generation Sequencing Data Using a Cloud-Based Data Analysis Pipeline

机译:使用基于云的数据分析管道对下一代测序数据进行核苷酸水平的变异分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

To demonstrate the flexibility of a cloud-based solution for analyzing disparate sets of next-generation sequencing data, we looked at carefully chosen samples across different populations from the 1,000 Genomes Project () and conducted an extensive analysis on two Chinese populations, the “Chinese in Beijing” (CHB) and the “Chinese in metropolitan Denver” (CHD), each consisting of 28 exomes. Each dataset was uploaded into the system using raw data files acquired from the 1,000 Genomes Project. Using these data and a cloud-based data analysis pipeline, we performed a nucleotide-level variant analysis combined with a population allele frequency analysis across all samples for the two populations. To identify alleles that are significantly different across the two populations, a Pearson's chi-square test was applied, which resulted in a total of 1.5 Mio SNPs, of which 84 were non-synonymous with a p-value of less than 0.01. Interestingly, the genes associated with non-synonymous variants of the Chinese in metropolitan Denver population were enriched for biological annotations such as endocrine system disorder, metabolic disease, cardiac fibrosis, and inflammation (includes ZNF264, RPS6KA2, ROBO2, CRK, MUSK, CBL, CRK, and others). Furthermore, genes usually associated with liver injury were also identified for this population, suggesting the liver is exposed to toxic agents more so in this population compared to the CHB population. The observed genomic differences in these two different Chinese populations living in different parts of the world hint towards a potential link between nutrition and different diseases (e.g. heart disease or metabolic diseases). Using this analysis as a case study, we will demonstrate how a scalable computational infrastructure can provide researchers and sequencing service providers alike, a cost effective and secure cloud-based computing platform as a powerful and collaborative technology solution for large scale sequence data analysis and management.
机译:为了展示基于云的解决方案来分析不同的下一代测序数据的灵活性,我们研究了1000个基因组计划()中跨不同人群精心挑选的样本,并对两个中国人群进行了广泛的分析。在北京”(CHB)和“大都会丹佛的中国人”(CHD),每个人都有28个外显子组。使用从1,000个基因组计划中获得的原始数据文件将每个数据集上载到系统中。使用这些数据和基于云的数据分析管道,我们对两个种群的所有样本进行了核苷酸水平变异分析和种群等位基因频率分析。为了鉴定两个群体之间显着不同的等位基因,应用了Pearson的卡方检验,该检验共产生1.5个Mio SNP,其中84个是非同义词,p值小于0.01。有趣的是,在大都会丹佛市人口中,与中国人非同义变体相关的基因被丰富用于生物学注释,例如内分泌系统失调,代谢性疾病,心脏纤维化和炎症(包括ZNF264,RPS6KA2,ROBO2,CRK,MUSK,CBL, CRK等)。此外,该人群中还发现了通常与肝损伤相关的基因,这表明与CHB人群相比,该人群中肝脏暴露于毒性更大。在生活在世界不同地区的这两个不同的中国人口中,观察到的基因组差异暗示了营养与不同疾病(例如心脏病或代谢性疾病)之间的潜在联系。使用此分析作为案例研究,我们将演示可扩展的计算基础架构如何为研究人员和测序服务提供商提供类似的,经济高效且安全的基于云的计算平台,作为用于大规模序列数据分析和管理的强大且协作的技术解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号