首页> 外文会议>Annual European Symposium on Algorithms >An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants
【24h】

An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants

机译:具有少量重组剂的二倍型推理的高效算法

获取原文

摘要

Combinatorial (or rule-based) methods for inferring haplotypes from genotypes on a pedigree have been studied extensively in the recent literature. These methods generally try to reconstruct the haplotypes of each individual so that the total number of recombinants is minimized in the pedigree. The problem is NP-hard, although it is known that the number of recombinants in a practical dataset is usually very small. In this paper, we consider the question of how to efficiently infer haplotypes on a large pedigree when the number of recombinants is bounded by a small constant, i.e. the so called k-recombinant haplotype configuration (k-RHC) problem. We introduce a simple probabilistic model for k-RHC where the prior haplotype probability of a founder and the haplotype transmission probability from a parent to a child are all assumed to follow the uniform distribution and k random recombinants are assumed to have taken place uniformly and independently in the pedigree. We present an O(mn (log n)~(k+1)) time algorithm for k-RHC on tree pedigrees without mating loops, where m is the number of loci and n is the size of the input pedigree, and prove that when 90 log n < m < n~3, the algorithm can correctly find a feasible haplotype configuration that obeys the Mendelian law of inheritance and requires no more than k recombinants with probability 1 - O(k~2 (log n)~2/m + 1/n~2). The algorithm is efficient when k is of a moderate value and could thus be used to infer haplotypes from genotypes on large tree pedigrees efficiently in practice. We have implemented the algorithm as a C++ program named TREE-K-RHC. The implementation incorporates several ideas for dealing with missing data and data with a large number of recombinants effectively, Our experimental results on both simulated and real datasets show that TREE-K-RHC can reconstruct haplotypes with a high accuracy and is much faster than the best combinatorial method in the literature.
机译:从一个谱系的基因型推断单倍型组合(或规则为基础的)方法已经在最近的文献中广泛研究。这些方法通常试图重建该重组体的总数在系谱被最小化每个单独的这样的单倍型。这个问题是NP难的,但众所周知,重组的实际数据集的数量通常很小。在本文中,我们考虑如何高效地推断在一个大谱系的单倍型的问题,当重组体的数量是由一个小的常数为界,即所谓的k重组单倍型构型(K-RHC)的问题。我们推出了K-RHC一个简单的概率模型,其中一个创始人的前单倍型概率,并从父到子单倍型传输概率都被假定为遵循均匀分布和k随机重组假设已经统一和独立发生在谱系。我们提出了一个O(MN(log n)的〜第(k + 1))对于k-RHC时间算法树系谱没有配套环,其中m是基因座的数目,n是输入谱系的大小,并证明当90日志N

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号