首页> 美国卫生研究院文献>American Journal of Human Genetics >Little Loss of Information Due to Unknown Phase for Fine-Scale Linkage-Disequilibrium Mapping with Single-Nucleotide–Polymorphism Genotype Data
【2h】

Little Loss of Information Due to Unknown Phase for Fine-Scale Linkage-Disequilibrium Mapping with Single-Nucleotide–Polymorphism Genotype Data

机译:由于单核苷酸-多态性基因型数据的精细连锁不平衡作图的未知阶段几乎没有信息丢失。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We present the results of a simulation study that indicate that true haplotypes at multiple, tightly linked loci often provide little extra information for linkage-disequilibrium fine mapping, compared with the information provided by corresponding genotypes, provided that an appropriate statistical analysis method is used. In contrast, a two-stage approach to analyzing genotype data, in which haplotypes are inferred and then analyzed as if they were true haplotypes, can lead to a substantial loss of information. The study uses our COLDMAP software for fine mapping, which implements a Markov chain–Monte Carlo algorithm that is based on the shattered coalescent model of genetic heterogeneity at a disease locus. We applied COLDMAP to 100 replicate data sets simulated under each of 18 disease models. Each data set consists of haplotype pairs (diplotypes) for 20 SNPs typed at equal 50-kb intervals in a 950-kb candidate region that includes a single disease locus located at random. The data sets were analyzed in three formats: (1) as true haplotypes; (2) as haplotypes inferred from genotypes using an expectation-maximization algorithm; and (3) as unphased genotypes. On average, true haplotypes gave a 6% gain in efficiency compared with the unphased genotypes, whereas inferring haplotypes from genotypes led to a 20% loss of efficiency, where efficiency is defined in terms of root mean integrated square error of the location of the disease locus. Furthermore, treating inferred haplotypes as if they were true haplotypes leads to considerable overconfidence in estimates, with nominal 50% credibility intervals achieving, on average, only 19% coverage. We conclude that (1), given appropriate statistical analyses, the costs of directly measuring haplotypes will rarely be justified by a gain in the efficiency of fine mapping and that (2) a two-stage approach of inferring haplotypes followed by a haplotype-based analysis can be very inefficient for fine mapping, compared with an analysis based directly on the genotypes.
机译:我们提供的模拟研究结果表明,与使用相应基因型提供的信息相比,与紧密相关的基因型提供的信息相比,在多个紧密连锁的基因座上的真实单倍型通常很少提供额外的信息。相比之下,分析基因型数据的两阶段方法会导致大量信息丢失,在这种方法中,推断单倍型然后将其视为真正的单倍型进行分析。该研究使用我们的COLDMAP软件进行精细定位,该软件实现了基于疾病位点遗传异质性的破碎合并模型的Markov链-蒙特卡洛算法。我们将COLDMAP应用于在18种疾病模型下模拟的100个重复数据集。每个数据集由20个SNP的单倍型对(双倍型)组成,在一个950 kb的候选区域中以相等的50 kb间隔进行分型,该候选区域包括一个随机定位的疾病位点。对数据集进行了三种格式的分析:(1)为真实单倍型; (2)使用期望最大化算法从基因型推导的单倍型; (3)为非分期基因型。平均而言,与未分阶段的基因型相比,真正的单倍型的效率提高了6%,而从基因型推断单倍型会导致效率降低20%,其中效率是根据疾病位置的均方根积分平方误差来定义的轨迹。此外,将推断出的单倍型视为真实单倍型会导致对估计值的过度自信,名义上50%的可信度区间平均只能实现19%的覆盖率。我们得出的结论是:(1)在进行适当的统计分析后,直接测量单倍型的成本很少会因精简绘制效率的提高而被证明是合理的;(2)推断单倍型的两阶段方法随后是基于单倍型的与直接基于基因型的分析相比,进行精细定位的分析效率可能非常低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号