首页> 外文会议>Algorithms - ESA 2009 >An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants
【24h】

An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants

机译:一种带有少量重组子的谱系单倍型推断的高效算法

获取原文
获取原文并翻译 | 示例

摘要

Combinatorial (or rule-based) methods for inferring haplotypes from genotypes on a pedigree have been studied extensively in the recent literature. These methods generally try to reconstruct the haplotypes of each individual so that the total number of recombinants is minimized in the pedigree. The problem is NP-hard, although it is known that the number of recombinants in a practical dataset is usually very small. In this paper, we consider the question of how to efficiently infer haplotypes on a large pedigree when the number of recombinants is bounded by a small constant, i.e. the so called k-recombinant haplotype configuration (k-RHC) problem. We introduce a simple probabilistic model for k-RHC where the prior haplotype probability of a founder and the haplotype transmission probability from a parent to a child are all assumed to follow the uniform distribution and k random recombinants are assumed to have taken place uniformly and independently in the pedigree. We present an O(mn log~(k+1) n) time algorithm for k-RHC on tree pedigrees without mating loops, where m is the number of loci and n is the size of the input pedigree, and prove that when 90 log re < m < n~3, the algorithm can correctly find a feasible haplotype configuration that obeys the Mendelian law of inheritance and requires no more than k recombinants withrnprobability 1 - O(k~2 log~2n/mn + 1~2). The algorithm is efficient when k is of a moderate value and could thus be used to infer haplotypes from genotypes on large tree pedigrees efficiently in practice. We have implemented the algorithm as a C++ program named Tree-k-RHC. The implementation incorporates several ideas for dealing with missing data and data with a large number of recombinants effectively. Our experimental results on both simulated and real datasets show that TREE-k-RHC can reconstruct haplotypes with a high accuracy and is much faster than the best combinatorial method in the literature.
机译:在最近的文献中已经广泛研究了从谱系上的基因型推导单倍型的组合(或基于规则)方法。这些方法通常尝试重建每个个体的单倍型,以使重组的总数在谱系中最小化。尽管已知实用数据集中的重组体数量通常很少,但是问题是NP困难的。在本文中,我们考虑了以下问题:当重组子的数量由一个小的常数限制时,即所谓的k重组单倍型构型(k-RHC)问题,如何在一个大的谱系上有效地推断单倍型。我们为k-RHC引入一个简单的概率模型,其中假定创始人的先前单倍型概率和从父母到孩子的单倍型传播概率均遵循均匀分布,并且假设k个随机重组体均发生且独立发生在血统书中。我们提出了不带交配环的树谱系上k-RHC的O(mn log〜(k + 1)n)时间算法,其中m是基因座数,n是输入谱系的大小,并证明当90 log re

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号