首页> 美国卫生研究院文献>PLoS Clinical Trials >Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions
【2h】

Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions

机译:使用稀疏拉普拉斯特征函数的祖先信息标记选择和种群结构可视化

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA) has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs) without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph Laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the similar underlying population structure efficiently. Employing a standard Support Vector Machine (SVM) to predict individuals' continental memberships on HGDP dataset of seven continents, we demonstrate that the selected SNPs by our method are more informative but less redundant than those selected by PCA. Our algorithm is a promising tool in genome-wide association studies and population genetics, facilitating the selection of structure informative markers, efficient detection of population substructure and ancestral inference.
机译:一小群人口结构信息标记物的鉴定可以降低基因分型的成本,并在各种应用中有用,例如关联映射中的祖先推断,人口遗传学中的法医学和进化论。确定祖先信息标记的传统方法通常需要个人祖先的先验知识,并且对于混合人群有困难。最近,已成功采用主成分分析(PCA)来选择与顶级重要主成分(PC)高度相关的SNP,而无需使用单独的祖先信息。该方法也适用于混合人群。在这里,我们基于最近的结果通过图拉普拉斯特征函数总结种群结构,提出了一种新颖的方法,该方法与PCA的不同之处在于它在几何上和对异常值的鲁棒性。我们的方法还利用了基因组中信息标记的先验稀疏性。通过模拟环型种群和在940个无关个体中进行基因分型的650K SNP的实际全球种群样本HGDP,我们在选择最具信息量的标记时验证了该算法,其中一小部分可以有效地恢复相似的基础种群结构。使用标准支持向量机(SVM)来预测七大洲的HGDP数据集上的个人大洲成员,我们证明了通过我们的方法选择的SNP比PCA选择的SNP具有更多的信息,但冗余性较低。我们的算法在全基因组关联研究和群体遗传学中是很有前途的工具,可帮助选择结构信息性标记,有效检测群体亚结构和祖先推断。

著录项

  • 期刊名称 PLoS Clinical Trials
  • 作者

    Jun Zhang;

  • 作者单位
  • 年(卷),期 2010(5),11
  • 年度 2010
  • 页码 e13734
  • 总页数 12
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号