首页> 美国卫生研究院文献>other >EFFICIENT HAPLOTYPE INFERENCE FROM PEDIGREES WITH MISSING DATA USING LINEAR SYSTEMS WITH DISJOINT-SET DATA STRUCTURES
【2h】

EFFICIENT HAPLOTYPE INFERENCE FROM PEDIGREES WITH MISSING DATA USING LINEAR SYSTEMS WITH DISJOINT-SET DATA STRUCTURES

机译:使用带有离散集数据结构的线性系统从缺少数据的谱系获得有效的单型推断

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e., single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O(mn · α(n)), where m is the number of loci, n is the number of individuals and α is the inverse Ackermann function, which is a further improvement over existing ones, , , . We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 105-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.
机译:我们从零重组假设下的系谱数据研究了单倍型推断问题,这在相对较大的染色体片段上的紧密链接的标记(即单核苷酸多态性(SNP))的真实数据得到了很好的支持。我们通过将基因型约束公式化为继承变量的线性系统,以严格的数学方式解决了这一问题。然后,我们利用不相交集结构对个人之间的连通性信息进行编码,以检测来自基因型的约束,并检查约束的一致性。在不丢失数据的树谱系上,我们的算法可以在近似线性时间O(mn·α(n))中输出一般解以及特定解的总数,其中m是基因座数,n是个体数,α是逆阿克曼函数 ,它是对现有个体 。通过考虑对继承变量的现有(部分)约束,我们还将思想扩展到环状谱系和缺少数据的谱系。该算法已在C ++中实现,并将被并入我们的PedPhase软件包 。实验结果表明,该算法能正确识别所有0重组溶液。与其他两种流行算法的比较表明,该算法在各种参数设置上可实现10到10 5 倍的改进。实验研究还为理论分析所建议的复杂性界限提供了经验证据。

著录项

  • 期刊名称 other
  • 作者

    Xin Li; Jing Li;

  • 作者单位
  • 年(卷),期 -1(7),-1
  • 年度 -1
  • 页码 297–308
  • 总页数 24
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号