...
首页> 外文期刊>European journal of human genetics: EJHG >Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies.
【24h】

Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies.

机译:处理人口数据中的缺失值:单倍型频率的最大似然估计的结果。

获取原文
获取原文并翻译 | 示例

摘要

Haplotype frequency estimation in population data is an important problem in genetics and different methods including expectation maximisation (EM) methods have been proposed. The statistical properties of EM methods have been extensively assessed for data sets with no missing values. When numerous markers and/or individuals are tested, however, it is likely that some genotypes will be missing. Thus, it is of interest to investigate the behaviour of the method in the presence of incomplete genotype observations. We propose an extension of the EM method to handle missing genotypes, and we compare it with commonly used methods (such as ignoring individuals with incomplete genotype information or treating a missing allele as any other allele). Simulations were performed, starting from data sets of haematopoietic stem cell donors genotyped at three HLA loci. We deleted some data to create incomplete genotype observations in various proportions. We then compared the haplotype frequencies obtained on these incomplete data sets using the different methods to those obtained on the complete data. We found that the method proposed here provides better estimations, both qualitatively and quantitatively, but increases the computation time required. We discuss the influence of missing values on the algorithm's efficiency and the advantages and disadvantages of deleting incomplete genotypes. We propose guidelines for missing data handling in routine analysis.
机译:人口数据中的单倍型频率估计是遗传学中的一个重要问题,已经提出了包括期望最大化(EM)方法在内的各种方法。 EM方法的统计特性已针对没有缺失值的数据集进行了广泛评估。但是,当测试大量标记和/或个体时,可能会缺少某些基因型。因此,感兴趣的是在存在不完整的基因型观察结果的情况下研究该方法的行为。我们提议扩展EM方法以处理缺失的基因型,并将其与常用方法进行比较(例如忽略具有不完整基因型信息的个体或将缺失的等位基因作为任何其他等位基因来处理)。从在三个HLA基因座处进行基因分型的造血干细胞供体的数据集开始进行了模拟。我们删除了一些数据,以各种比例创建了不完整的基因型观察值。然后,我们将使用不同方法在这些不完整数据集上获得的单倍型频率与在完整数据上获得的单倍型频率进行了比较。我们发现,这里提出的方法在定性和定量方面都提供了更好的估计,但是增加了所需的计算时间。我们讨论了缺失值对算法效率的影响以及删除不完整基因型的优缺点。我们提出了常规分析中缺少数据处理的准则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号