首页> 外文期刊>Bioinformatics >Fast spatial ancestry via flexible allele frequency surfaces
【24h】

Fast spatial ancestry via flexible allele frequency surfaces

机译:通过灵活的等位基因频率表面实现快速的空间祖先

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Unique modeling and computational challenges arise in locating the geographic origin of individuals based on their genetic backgrounds. Single-nucleotide polymorphisms (SNPs) vary widely in informativeness, allele frequencies change non-linearly with geography and reliable localization requires evidence to be integrated across a multitude of SNPs. These problems become even more acute for individuals of mixed ancestry. It is hardly surprising that matching genetic models to computational constraints has limited the development of methods for estimating geographic origins. We attack these related problems by borrowing ideas from image processing and optimization theory. Our proposed model divides the region of interest into pixels and operates SNP by SNP. We estimate allele frequencies across the landscape by maximizing a product of binomial likelihoods penalized by nearest neighbor interactions. Penalization smooths allele frequency estimates and promotes estimation at pixels with no data. Maximization is accomplished by a minorize-maximize (MM) algorithm. Once allele frequency surfaces are available, one can apply Bayes' rule to compute the posterior probability that each pixel is the pixel of origin of a given person. Placement of admixed individuals on the landscape is more complicated and requires estimation of the fractional contribution of each pixel to a person's genome. This estimation problem also succumbs to a penalized MM algorithm.Results: We applied the model to the Population Reference Sample (POPRES) data. The model gives better localization for both unmixed and admixed individuals than existing methods despite using just a small fraction of the available SNPs. Computing times are comparable with the best competing software
机译:动机:在根据个体的遗传背景定位个体的地理起源时会遇到独特的建模和计算挑战。单核苷酸多态性(SNP)的信息性差异很大,等位基因频率随地理位置呈非线性变化,可靠的定位要求将证据整合到多个SNP中。对于混合血统的人来说,这些问题变得更加严重。将遗传模型与计算约束相匹配限制了地理起源估计方法的开发,这不足为奇。我们通过借鉴图像处理和优化理论的思想来解决这些相关问题。我们提出的模型将关注区域划分为像素,并通过SNP操作SNP。我们通过最大化因最近邻居互动而受到惩罚的二项式似然的乘积来估计整个景观的等位基因频率。罚分使等位基因频率估计平滑,并促进了无数据像素的估计。最大化是通过次最大化(MM)算法实现的。一旦等位基因频率表面可用,就可以应用贝叶斯规则计算每个像素是给定人的起源像素的后验概率。混合个体在景观上的放置更为复杂,需要估算每个像素对一个人的基因组的贡献。该估计问题也屈服于惩罚式MM算法。结果:我们将模型应用于“人口参考样本”(POPRES)数据。尽管只使用了一小部分可用的SNP,但该模型对未混合和混合个体的定位都优于现有方法。计算时间可与最佳竞争软件相媲美

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号