首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >gerbil: Genotype resolution and block identification using likelihood
【2h】

gerbil: Genotype resolution and block identification using likelihood

机译:沙鼠:基因型解析和使用可能性的区块识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The abundance of genotype data generated by individual and international efforts carries the promise of revolutionizing disease studies and the association of phenotypes with individual polymorphisms. A key challenge is providing an accurate resolution (phasing) of the genotypes into haplotypes. We present here results on a method for genotype phasing in the presence of recombination. Our analysis is based on a stochastic model for recombination-poor regions (”blocks”), in which haplotypes are generated from a small number of core haplotypes, allowing for mutations, rare recombinations, and errors. We formulate genotype resolution and block partitioning as a maximum-likelihood problem and solve it by an expectation-maximization algorithm. The algorithm was implemented in a software package called gerbil (genotype resolution and block identification using likelihood), which is efficient and simple to use. We tested gerbil on four large-scale sets of genotypes. It outperformed two state-of-the-art phasing algorithms. The phase algorithm was slightly more accurate than gerbil when allowed to run with default parameters, but required two orders of magnitude more time. When using comparable running times, gerbil was consistently more accurate. For data sets with hundreds of genotypes, the time required by phase becomes prohibitive. We conclude that gerbil has a clear advantage for studies that include many hundreds of genotypes and, in particular, for large-scale disease studies.
机译:由个体和国际努力产生的大量基因型数据带来了革新疾病研究以及将表型与个体多态性联系起来的希望。一个关键的挑战是如何将基因型准确解析(定相为单倍型)。我们在这里提出了在重组存在下进行基因型定相的方法的结果。我们的分析基于重组贫乏区域(“区块”)的随机模型,在该模型中,少数核心单倍型产生了单倍型,允许突变,罕见重组和错误。我们将基因型分辨率和块划分公式化为最大似然问题,并通过期望最大化算法解决。该算法在称为gerbil(基因型分辨率和使用可能性进行区块识别)的软件包中实现,该软件包高效且易于使用。我们在四种大型基因型集上测试了沙鼠。它优于两种最先进的调相算法。当允许使用默认参数运行时,相位算法比沙鼠更精确,但是需要更多的时间两个数量级。当使用可比的运行时间时,沙鼠始终更加准确。对于具有数百个基因型的数据集,阶段所需的时间变得令人望而却步。我们得出的结论是,沙土鼠在包括数百种基因型的研究中,尤其是在大规模疾病研究中具有明显的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号