【24h】

Phasing of 2-SNP Genotypes Based on Non-random Mating Model

机译:基于非随机匹配模型的2-SNP基因型分期

获取原文
获取原文并翻译 | 示例

摘要

Emerging microarray technologies allow genotyping of long genome sequences resulting in huge amount of data. A key challenge is to provide an accurate phasing of very long single nucleotide polymorphism (SNP) sequences. In this paper we explore phasing of genotypes with 2 SNPs adjusted to the non-random mating model and then apply it to the haplotype inference of complete genotypes using maximum spanning trees. The runtime of the algorithm is O(nm(n + m)), where n and m are the number of genotypes and SNPs, respectively. The proposed phasing algorithm (2SNP) can be used for comparatively accurate phasing of large number of very long genome sequences. On datasets across 79 regions from HapMap 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7pl5:2 data from HapMap) versus GERBIL and PHASE requiring more than a week of runtime and admitting no less errors than 2SNP.
机译:新兴的微阵列技术允许对长基因组序列进行基因分型,从而产生大量数据。一个关键的挑战是提供非常长的单核苷酸多态性(SNP)序列的准确定相。在本文中,我们探讨了将2个SNP调整为非随机交配模型的基因型的定相,然后将其应用于使用最大生成树的完整基因型的单倍型推断。该算法的运行时间为O(nm(n + m)),其中n和m分别是基因型和SNP的数量。所提出的定相算法(2SNP)可用于大量非常长的基因组序列的相对准确的定相。在HapMap 2SNP的79个区域的数据集中,比GERBIL和PHASE快几个数量级,同时通过正确定相的基因型数量,单位点和切换错误来衡量它们的质量。例如,Pentium 4 2Ghz处理器上的2SNP需要41 s的时间来对具有1381个SNP(来自HapMap的ENm010.7pl5:2数据)的30个基因型进行分期,而GERBIL和PHASE则需要超过一周的运行时间,并且接受的错误不少于2SNP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号