首页> 外文期刊>PLoS Genetics >Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
【24h】

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

机译:人口结构分析:基于稀疏因子分析的统一框架和新方法

获取原文
           

摘要

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models. Author Summary Two different approaches have become widely used in the analysis of population structure: admixture-based models and principal components analysis (PCA). In admixture-based models each individual is assumed to have inherited some proportion of its ancestry from one of several distinct populations. PCA projects the individuals into a low-dimensional subspace. On the face of it, these methods seem to have little in common. Here we show how in fact both of these methods can be viewed within a single unifying framework. This viewpoint should help practitioners to better interpret and contrast the results from these methods in real data applications. It also provides a springboard to the development of novel approaches to this problem. We introduce one such novel approach, based on sparse factor analysis, which has elements in common with both admixture-based models and PCA. As we illustrate here, in some settings sparse factor analysis may provide more interpretable results than either admixture-based models or PCA.
机译:我们考虑使用遗传数据对人口结构进行统计分析。我们展示了如何在一个单一的矩阵分解统一框架内查看两种最广泛使用的人口结构建模方法,即基于混合物的模型和主成分分析(PCA)。具体而言,它们都可以解释为通过两个较低等级矩阵的乘积来近似观察到的基因型矩阵,但是在这些较低等级矩阵上具有不同的约束或先验分布。通过考虑其他限制因素或先验条件,这为分析人口结构的各种可能方法打开了大门。在本文中,我们介绍了一种基于稀疏因子分析(SFA)的新颖方法。我们研究了几种实际和模拟数据集中不同类型约束的影响。我们发现,当样本是由几个分化程度非常高的祖先种群衍生而来时,SFA会产生与基于混合模型的结果相似的结果,并且当种群结构更加“连续”时,SFA可以概括PCA的结果,例如按距离隔离模型。作者摘要两种不同的方法已广泛用于人口结构分析:基于混合物的模型和主成分分析(PCA)。在基于混合模型的模型中,假定每个人都从几个不同群体之一继承了一定比例的祖先。 PCA将个体投影到低维子空间中。从表面上看,这些方法似乎没有什么共同之处。在这里,我们展示了实际上如何在单个统一框架中查看这两种方法。这种观点应有助于从业人员在实际数据应用中更好地解释和对比这些方法的结果。它还为开发解决此问题的新方法提供了跳板。我们基于稀疏因子分析介绍一种这样的新颖方法,它具有与基于混合模型和PCA相同的元素。正如我们在此处说明的那样,在某些情况下,稀疏因子分析可能比基于混合模型或PCA的结果更具解释性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号