首页> 外文期刊>BMC Bioinformatics >Whole genome association mapping by incompatibilities and local perfect phylogenies
【24h】

Whole genome association mapping by incompatibilities and local perfect phylogenies

机译:不相容性和局部完美系统发育的全基因组关联图谱

获取原文
           

摘要

Background With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. Results We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. Conclusion Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.
机译:背景技术利用当前的技术,可以在关联研究中廉价且有效地产生大量数据,并且为了防止数据分析成为研究的瓶颈,必须开发可缩放至此类数据集大小的快速且有效的分析方法。结果我们提出了一种在大量病例和对照的高密度病例-对照关联作图实验中准确定位引起疾病的变异的快速方法。该方法在“完美的”系统发育树中搜索案例染色体的显着聚类,该树由与单个系统发育树兼容的每个标记周围的最大区域定义。这种完善的系统发育树被视为确定疾病状态的决策树,并通过其准确性作为决策树进行评分。这样做的理由是,在影响突变的疾病附近的完美系统发育比随机树应该提供有关受影响/未受影响分类的更多信息。如果兼容区域包含很少的标记,例如如果标记间距较大,该算法可以允许包含不相容标记,以便在估计其系统发育之前扩大区域。可以分析单倍型数据和分阶段的基因型数据。在1)不同疾病确定模型下的模拟基因型数据2)从HapMap资源创建的人工数据集,以及3)用于测试其他方法的数据集以进行比较研究方法的功效和效率。在单一疾病引起突变和恒定重组率的最简单情况下,我们的方法具有与单一标记物关联(SMA)相同的准确性。但是,当涉及更复杂的突变异质性场景和更复杂的单倍型结构时(例如在HapMap数据中发现),我们的方法比SMA以及其他快速的数据挖掘方法(如HapMiner和单倍型模式挖掘(HPM))要好得多。快点。对于未定相的基因型数据,估计相位的初始步骤只会稍微降低方法的功效。还发现该方法可以在实验数据集中准确定位已知的易感性变体(囊性纤维化的ΔF508突变)(其中已发现易感性变体),并找到与CYP2D6基因与不良药物代谢相关的重要信号,尽管对于此数据集,最高关联得分约为CYP2D6基因的60 kb。结论我们的方法已在Blossoc(BLOck aSSOCiation)软件中实现。使用Blossoc,可以在不到两个CPU小时内分析1000个案例中的300万个SNP的基于全基因组芯片的调查。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号