首页> 美国卫生研究院文献>American Journal of Human Genetics >Molecular and statistical approaches to the detection and correction of errors in genotype databases.
【2h】

Molecular and statistical approaches to the detection and correction of errors in genotype databases.

机译:用于检测和纠正基因型数据库中错误的分子和统计方法。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps.
机译:基因分型数据中的错误已显示出对高分辨率遗传图谱中重组分数估计的重大影响。现有数据库中错误的先前估计仅限于分析相对较少的标记,建议的错误率范围为0.5%-1.5%。本研究利用了这样一个事实,即在休曼中心(Ethude du Polymorphisme Humain)参考家族中,有21个个体是一个以上家族的成员,CEPH为这些个体的每次出现提供了单独的DNA样本。通过比较它们出现的每个家族中这些个体的基因型,可以计算出4.0 CEPH版本中所有基因座的估计错误率为1.4%。除去那些被CEPH明确鉴定为在多个家庭中出现的个体,导致其余样本的错误率达到3.0%,这表明在提交数据之前,可能会对已鉴定的重复个体进行一些错误检查。对于通过整个CEPH集合重新输入的四个5号染色体标记,也获得了4.0版数据的3.0%的错误率。这些错误对多点地图的影响非常显着,有错误的性别平均总长度为36.09 cM,已纠正错误的总平均性别长度为19.47 cM。介绍了几种在链接分析过程中检测并允许错误的统计方法。一种方法是根据对最大lod分数的影响来识别可能包含错误的家庭,这种方法显示出特别的希望,尤其是在与重新确定的家庭有限的重新键入方式结合使用时。已建立的基因型数据库中已证明的错误率对高分辨率作图的影响是重大的,这就提出了将这种现有数据纳入新的遗传图谱的总体价值的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号