首页> 外文学位 >Whole genome analyses accounting for structures in genotype data.
【24h】

Whole genome analyses accounting for structures in genotype data.

机译:全基因组分析考虑了基因型数据中的结构。

获取原文
获取原文并翻译 | 示例

摘要

Whole genome analysis is a powerful tool for accurately predicting the genetic merit of selection candidates and for mapping quantitative trait loci (QTL) with high resolution. Single-nucleotide polymorphism (SNP) markers that cover the entire genome unveil the information about QTL through either linkage disequilibrium (LD) with the QTL in founders or cosegregation (CS) with the QTL in nonfounders given a pedigree. Due to the advances in molecular biology and the associated drop in the cost of genotyping, the density of SNPs and the number of individuals that have phenotypes and genotypes are both increasing dramatically for whole genome analyses. Consider a matrix of genotypes collected for analysis, where rows are the genotypes of individuals across SNPs and columns are the genotypes of SNPs across individuals. As explained below, structures exist in such a genotype matrix and will become more evident and important as the SNP density and the training population size increase.;Horizontally, haplotype block structures are observed across SNP loci in the genome due to the historical cosegregation, which creates LD, or recent cosegregation. These structures exist even in the gametes of a single individual. The statistical dependence of the SNP effects is therefore expected in small chromosomal segments given the presence of QTL. However, most of the methods for whole genome analyses do not account for this dependence of the SNP effects.;Vertically, individuals in the pedigree will share a large proportion of alleles that are identical-by-descent (IBD) if they have a common recent ancestor, or vice versa. The genomic (IBD) relationship structure therefore manifests at each locus across individuals in the pedigree, and for closely linked loci, these structures will be very similar due to CS. Alleles that are identical-by-descent are also identical-by-state (IBS) but the inverse is not true. Thus, the genomic relationship structures may not be properly accounted for by the methods that use IBS relationships computed from SNP genotypes.;Two methods, BayesN and the QTL model, have been developed in this thesis to account for the structure in the genotypes that are used for whole genome analyses. BayesN is a nested marker effects model, where SNP effects in each small genomic window are a priori considered dependent. Compared with BayesB, where the structure in the genome is ignored and SNP effects are assumed to be independently and identically distributed, BayesN gave a higher accuracy of genomic prediction for breeding values, especially when high-density SNP panels were used and the QTL had rare alleles. When BayesN was used for QTL discovery, the proportion of false positives (PFP) for finding QTL was perfectly controlled in the case of common QTL alleles and was controlled better than BayesB in the case of rare QTL alleles. At the same level of PFP, BayesN had a higher power than BayesB for detecting QTL that had rare alleles and at least 1% of the total genetic variance.;The QTL model includes the effects of the unobserved QTL genotypes, and the phenotype therefore has a mixture distribution. The mixture model exploits information from the pedigree, LD and CS optimally to model the QTL allele states in founders and allele inheritance in nonfounders. Thus, the QTL model accounts for horizontal structure across loci and vertical structure across individuals because only information from the SNPs that are within a small chromosomal segment contribute to the modeling of QTL alleles in that segment. In a range of pedigree structures, the QTL model had a substantially higher accuracy than BayesC for genomic prediction when training population consisted of multiple families, generations, or breeds.;In QTL discovery, signal from the QTL may bleed to neighboring genomic windows depending on the structures of the genome. It is therefore suggested to search QTL in the window that has a positive test result as well as its flanking windows, or to use hypotheses that only test for large genetic variance (at least 1% of the total genetic variance for example).;In conclusion, parsimonious and sophisticated methods that account for the horizontal and vertical structures in genotypes were developed for whole genome analyses. Both methods gave higher accuracy of genomic prediction and trait loci discovery than the widely used methods that ignore these structures. Both methods are expected to be more efficient with respect to computing time and performance as higher SNP densities or sequence data are used in whole genome analyses. (Abstract shortened by UMI.).
机译:全基因组分析是一种功能强大的工具,可准确地预测选择候选者的遗传优势并以高分辨率绘制定量性状位点(QTL)。覆盖整个基因组的单核苷酸多态性(SNP)标记通过建立者与QTL的连锁不平衡(LD)或非建立者与QTL的共分离(CS)共同揭示了有关QTL的信息。由于分子生物学的进步以及相关的基因分型成本的下降,对于全基因组分析,SNP的密度以及具有表型和基因型的个体数量都在急剧增加。考虑收集用于分析的基因型矩阵,其中行是跨SNP的个体的基因型,列是跨个体的SNP的基因型。如下所述,结构存在于这样的基因型矩阵中,并且随着SNP密度和训练种群数量的增加而变得更加明显和重要。创建LD或最近的共同隔离。这些结构甚至存在于单个个体的配子中。因此,在存在QTL的情况下,可以预期在较小的染色体片段中SNP效应的统计依赖性。但是,大多数用于全基因组分析的方法并不能解释SNP效应的这种依赖性。;垂直地,如果血统相同,血统相同的个体(IBD)将共享很大比例的等位基因最近的祖先,反之亦然。因此,基因组(IBD)关系结构出现在谱系中各个个体的每个基因座上,并且对于紧密连锁的基因座,由于CS,这些结构将非常相似。血统相同的等位基因也是状态相同的(IBS),但反之则不成立。因此,使用从SNP基因型计算出的IBS关系的方法可能无法正确解释基因组关系结构。本论文开发了两种方法BayesN和QTL模型来说明基因型结构。用于全基因组分析。 BayesN是一个嵌套标记效应模型,其中每个小基因组窗口中的SNP效应都被先验地认为是依赖的。与忽略基因组结构且假定SNP效应独立且均一分布的BayesB相比,BayesN提供了更高的基因组预测准确性的基因组预测值,尤其是在使用高密度SNP面板且QTL很少的情况下等位基因。当使用BayesN进行QTL发现时,在常见QTL等位基因的情况下,发现QTL的假阳性(PFP)比例得到了很好的控制,而在罕见QTL等位基因的情况下,与BayesB相比,控制得更好。在相同的PFP水平下,BayesN具有比BayesB更高的检测QTL的能力,这些QTL具有罕见的等位基因且至少占总遗传变异的1%.; QTL模型包括未观察到的QTL基因型的影响,因此表型具有混合分布。混合模型最佳地利用了来自系谱,LD和CS的信息,以对创建者中的QTL等位基因状态和非创建者中的等位基因继承进行建模。因此,QTL模型考虑了跨基因座的水平结构和跨个体的垂直结构,因为只有来自小染色体片段内SNP的信息才有助于该片段中QTL等位基因的建模。在一系列谱系结构中,当训练人口由多个科,世代或品种组成时,QTL模型具有比BayesC更高的基因组预测准确性。基因组的结构。因此,建议在测试结果为阳性的窗口及其侧面窗口中搜索QTL,或使用仅测试较大遗传变异(例如,至少占总遗传变异的1%)的假设。结论是,开发了用于基因型水平和垂直结构的简约而复杂的方法,用于全基因组分析。与忽略这些结构的广泛使用的方法相比,这两种方法均能提供更高的基因组预测和性状位点发现准确性。由于在整个基因组分析中使用了更高的SNP密度或序列数据,因此这两种方法都有望在计算时间和性能上更加高效。 (摘要由UMI缩短。)。

著录项

  • 作者

    Zeng, Jian.;

  • 作者单位

    Iowa State University.;

  • 授予单位 Iowa State University.;
  • 学科 Genetics.;Animal sciences.;Biostatistics.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 160 p.
  • 总页数 160
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号