首页> 外文期刊>BMC Bioinformatics >Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
【24h】

Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA

机译:基于合并DNA的联合约束稀疏表示的最大简约单倍型频率推断

获取原文
获取外文期刊封面目录资料

摘要

Background DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. Results We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. Conclusions We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL.
机译:在全基因组关联研究中,背景DNA合并是一种经济有效的选择。在DNA汇集中,将来自不同个体的等摩尔量的DNA混合到一个样品中,并在单个基因型实验中观察每个位置的每个等位基因的频率。除了单基因座分析之外,从合并数据中鉴定单倍型频率在这些研究中也引起了不同的兴趣,因为单倍型可以提高统计能力并提供更多的见解。结果我们基于单倍型词典中DNA池的稀疏表示,从合并的DNA数据开发了一种用于最大简约单倍型频率估计的方法。还提供了对数据嘈杂甚至丢失的方案的扩展。首先将所得方法应用于基于AGT基因的单倍型及其相关频率的模拟数据。我们进一步评估了我们的方法,该数据集由HapMap CEU人群的前7Mb的SNP组成。为了测试所提出方法的扩展性,在数据集中进一步引入了噪声和丢失数据。 HIPPO和HAPLOPOOL也都应用于这些数据集以比较性能。结论我们在相对于个体基因型分型更有效的情况下评估了我们的方法。也就是说,在包含具有少量个体的池的数据集中。我们表明,在这种情况下,我们的方法要优于最新技术,例如HIPPO和HAPLOPOOL。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号