首页> 外文会议>Research in computational molecular biology >Conservative Extensions of Linkage Disequilibrium Measures from Pairwise to Multi-loci and Algorithms for Optimal Tagging SNP Selection
【24h】

Conservative Extensions of Linkage Disequilibrium Measures from Pairwise to Multi-loci and Algorithms for Optimal Tagging SNP Selection

机译:连锁不平衡度量从成对到多位置的保守扩展和最优标记SNP选择算法

获取原文
获取原文并翻译 | 示例

摘要

We present results on two classes of problems. The first result addresses the long standing open problem of finding unifying principles for Linkage Disequilibrium (LD) measures in population genetics (Lewontin 1964 [10], Hedrick 1987 [8], Devlin and Risch 1995 [5]). Two desirable properties have been proposed in the extensive literature on this topic and the mutual consistency between these properties has remained at the heart of statistical and algorithmic difficulties with haplotype and genome-wide association study analysis. The first axiom is (1) The ability to extend LD measures to multiple loci as a conservative extension of pairwise LD. All widely used LD measures are pairwise measures. Despite significant attempts, it is not clear how to naturally extend these measures to multiple loci, leading to a "curse of the pairwise". The second axiom is (2) The Interpretability of Intermediate Values. In this paper, we resolve this mutual consistency problem by introducing a new LD measure, directed informativeness J (the directed graph theoretic counterpart of the informativeness measure introduced by Halldorsson et al. [6]) and show that it satisfies both of the above axioms. We also show the maximum informative subset of tagging SNPs based on X can be computed exactly in polynomial time for realistic genome-wide data. Furthermore, we present polynomial time algorithms for optimal genome-wide tagging SNPs selection for a number of commonly used LD measures, under the bounded neighborhood assumption for linked pairs of SNPs. One problem in the area is the search for a quality measure for tagging SNPs selection that unifies the LD-based methods such as LD-select (implemented in Tagger, de Bakker et al. 2005 [4], Carlson et al. 2004 [3]) and the information-theoretic ones such as informativeness. We show that the objective function of the LD-select algorithm is the Minimal Dominating Set (MDS) on r2-SNP graphs and show that we can compute MDS in polynomial time for this class of graphs. Although in LD-select the "maximally informative" solution is obtained through a greedy algorithm, and therefore better referred to as "locally maximally informative," we show that in fact, Tagger (LD-select) performs very close to the global maximally informative optimum.
机译:我们提出两类问题的结果。第一个结果解决了长期存在的问题,即寻找种群遗传学中​​连锁不平衡(LD)措施的统一原则(Lewontin 1964 [10],Hedrick 1987 [8],Devlin and Risch 1995 [5])。在有关该主题的大量文献中已经提出了两个理想的特性,并且这些特性之间的相互一致性仍然是单倍型和全基因组关联研究分析的统计和算法难题的核心。第一个公理是(1)将LD度量扩展到多个基因座的能力,作为成对LD的保守扩展。所有广泛使用的LD度量都是成对度量。尽管进行了大量尝试,但是尚不清楚如何自然地将这些度量扩展到多个基因座,从而导致“成对的诅咒”。第二个公理是(2)中间值的可解释性。在本文中,我们通过引入一种新的LD度量(有向信息性J(Halldorsson等人[6]引入的有向信息理论的有向图理论对应物))解决了这一相互一致性问题,并证明它满足了上述两个公理。我们还显示,基于X的标记SNP的最大信息量子集可以在多项式时间内精确计算出,以获得现实的全基因组数据。此外,在有界对SNP对的假设的有界邻域假设下,我们提出了多项多项式时间算法,用于为许多常用LD量度选择最佳的全基因组标记SNP。该领域的一个问题是寻求一种用于标记SNP选择的质量度量,以统一基于LD的基于LD的方法(在Tagger中执行,de Bakker等人,2005 [4],Carlson等人,2004 [3] ])和诸如信息性之类的信息理论。我们证明了LD选择算法的目标函数是r2-SNP图上的最小支配集(MDS),并且表明我们可以针对此类图在多项式时间内计算MDS。尽管在LD-select中,“最大信息量”解决方案是通过贪婪算法获得的,因此更好地称为“局部最大信息量”,但我们表明,事实上,Tagger(LD-select)的表现非常接近全局最大信息量最佳。

著录项

  • 来源
  • 会议地点 Vancouver(CA);Vancouver(CA)
  • 作者单位

    Center for Computational Molecular Biology, Department of Computer Science, Brown University, Providence, RI 02912;

    Department of Computer Science, University of California, Davis, CA 95616;

    Center for Computational Molecular Biology, Department of Computer Science, Brown University, Providence, RI 02912;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物工程学(生物技术);
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号