...
首页> 外文期刊>BMC Genetics >On rare variants in principal component analysis of population stratification
【24h】

On rare variants in principal component analysis of population stratification

机译:在人口分层主要成分分析中的罕见变体

获取原文

摘要

Population stratification is a known confounder of genome-wide association studies, as it can lead to false positive results. Principal component analysis (PCA) method is widely applied in the analysis of population structure with common variants. However, it is still unclear about the analysis performance when rare variants are used. We derive a mathematical expectation of the genetic relationship matrix. Variance and covariance elements of the expected matrix depend explicitly on allele frequencies of the genetic markers used in the PCA analysis. We show that inter-population variance is solely contained in K principal components (PCs) and mostly in the largest K-1 PCs, where K is the number of populations in the samples. We propose FPC, ratio of the inter-population variance to the intra-population variance in the K population informative PCs, and d2, sum of squared distances among populations, as measures of population divergence. We show analytically that when allele frequencies become small, the ratio FPC abates, the population distance d2 decreases, and portion of variance explained by the K PCs diminishes. The results are validated in the analysis of the 1000 Genomes Project data. The ratio FPC is 93.85, population distance d2 is 444.38, and variance explained by the largest five PCs is 17.09% when using with common variants with allele frequencies between 0.4 and 0.5. However, the ratio, distance and percentage decrease to 1.83, 17.83 and 0.74%, respectively, with rare variants of frequencies between 0.0001 and 0.01. The PCA of population stratification performs worse with rare variants than with common ones. It is necessary to restrict the selection to only the common variants when analyzing population stratification with sequencing data.
机译:人口分层是众所周知的基因组关联研究的混淆,因为它可以导致错误的阳性结果。主要成分分析(PCA)方法广泛应用于普通变体的群体结构分析。但是,在使用罕见变体时仍然尚不清楚分析性能。我们衍生出遗传关系矩阵的数学期望。预期矩阵的方差和协方差元素在显式上依赖于PCA分析中使用的遗传标记的等位基因频率。我们表明,群间方差仅包含在K主成分(PC)中,大多数在最大的K-1 PC中,其中K是样品中的群体数量。我们提出FPC,群体间差异与人口内信息PC中的人口内差异的比例,以及人口中平方距离的D2,作为人口分歧的衡量标准。我们在分析上展示了当等位基因频率变小时,比率FPC折叠,人口距离D2减小,并且K PCS解释的部分差异减小。结果在分析1000个基因组项目数据的分析中验证。该比率FPC为93.85,人口距离D2为444.38,当使用常见变体时,最大五个PC的差异为17.09%,等位基因频率在0.4和0.5之间。然而,比例,距离和百分比分别降至1.83,17.83和0.74%,罕见的频率含量在0.0001和0.01之间。人口分层的PCA与罕见的变体比与普通变体更差。在通过测序数据分析人口分层时,有必要仅限于常见变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号