首页> 美国卫生研究院文献>American Journal of Human Genetics >Accuracy of Haplotype Frequency Estimation for Biallelic Loci via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data
【2h】

Accuracy of Haplotype Frequency Estimation for Biallelic Loci via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data

机译:通过等分二倍体基因型数据的期望最大化算法对双等位基因位点的单倍型频率估计的准确性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Haplotype analyses have become increasingly common in genetic studies of human disease because of their ability to identify unique chromosomal segments likely to harbor disease-predisposing genes. The study of haplotypes is also used to investigate many population processes, such as migration and immigration rates, linkage-disequilibrium strength, and the relatedness of populations. Unfortunately, many haplotype-analysis methods require phase information that can be difficult to obtain from samples of nonhaploid species. There are, however, strategies for estimating haplotype frequencies from unphased diploid genotype data collected on a sample of individuals that make use of the expectation-maximization (EM) algorithm to overcome the missing phase information. The accuracy of such strategies, compared with other phase-determination methods, must be assessed before their use can be advocated. In this study, we consider and explore sources of error between EM-derived haplotype frequency estimates and their population parameters, noting that much of this error is due to sampling error, which is inherent in all studies, even when phase can be determined. In light of this, we focus on the additional error between haplotype frequencies within a sample data set and EM-derived haplotype frequency estimates incurred by the estimation procedure. We assess the accuracy of haplotype frequency estimation as a function of a number of factors, including sample size, number of loci studied, allele frequencies, and locus-specific allelic departures from Hardy-Weinberg and linkage equilibrium. We point out the relative impacts of sampling error and estimation error, calling attention to the pronounced accuracy of EM estimates once sampling error has been accounted for. We also suggest that many factors that may influence accuracy can be assessed empirically within a data set—a fact that can be used to create “diagnostics” that a user can turn to for assessing potential inaccuracies in estimation.
机译:由于人类单倍型分析能够识别可能携带疾病易感基因的独特染色体片段,因此在人类疾病的基因研究中已变得越来越普遍。对单倍型的研究还用于调查许多人口过程,例如迁徙和移民率,连锁不平衡强度以及人口的亲缘关系。不幸的是,许多单倍型分析方法需要很难从非单倍体物种的样品中获得的相信息。但是,有一些策略可以根据期望样本最大化(EM)算法来克服缺失的相位信息,从而从在一个个体样本上收集的非定相二倍体基因型数据中估计单倍型频率。与其他相测定方法相比,必须先评估此类策略的准确性,然后再提倡使用它们。在这项研究中,我们考虑并探索了EM衍生的单倍型频率估计与其总体参数之间的误差来源,并指出,这种误差很大一部分是由于采样误差所致,即使在可以确定相位的情况下,这也是所有研究固有的误差。有鉴于此,我们关注样本数据集内的单倍型频率与估计程序所产生的EM衍生的单倍型频率估计之间的附加误差。我们评估单倍型频率估计的准确性与许多因素的关系,包括样本量,研究的基因座数目,等位基因频率以及来自Hardy-Weinberg和连锁平衡的基因座特异性等位基因偏离。我们指出了采样误差和估计误差的相对影响,并在解决了采样误差后提请注意EM估计的明显准确性。我们还建议,可以在数据集中凭经验评估许多可能影响准确性的因素,这一事实可用于创建“诊断”,用户可以用来评估评估中的潜在不准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号