首页> 美国卫生研究院文献>American Journal of Human Genetics >Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation
【2h】

Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

机译:使用投影前兆分析和基因型估算对基因分型和测序数据进行改进的祖先估计

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
机译:在遗传关联研究中,尤其是从多个来源收集大量样本时,准确估计个体祖先非常重要。但是,针对全基因组SNP数据开发的现有方法不适用于少量的遗传数据,例如在靶向测序或外显子组芯片基因分型实验中。我们提出了一个统计框架,用于估计由一组参考个体生成的主成分谱系图中的个体谱系。该框架扩展并改进了我们先前使用低覆盖率序列读数(LASER 1.0)来分析基因型或测序数据的祖先估计方法。特别是,我们介绍了一种投影Procrustes分析方法,该方法使用高维主成分来估计低维参考空间中的祖先。通过广泛的模拟和经验数据示例,我们表明,在估计精细基因谱系方面,我们的新方法(LASER 2.0)与对参考个体的基因型估算相结合,可以大大优于LASER 1.0。具体来说,LASER 2.0可以使用外显子组芯片基因型或靶向测序数据准确估计欧洲范围内的精细血统,其脱靶覆盖率低至0.05倍。在LASER 2.0的框架下,我们可以估计在共享参考空间中针对不同基因座或通过不同技术测定的样品的单个祖先。因此,我们的血统估计方法不仅可以帮助在单个研究中建立血统模型,而且可以促进对来自多个来源的遗传数据进行综合分析,从而加快疾病关联研究的发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号