首页> 外文期刊>Computation >Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials
【24h】

Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials

机译:使用正交多项式计算联合站点频谱的似然性

获取原文
获取外文期刊封面目录资料

摘要

In population genetics, information about evolutionary forces, e.g., mutation, selection and genetic drift, is often inferred from DNA sequence information. Generally, DNA consists of two long strands of nucleotides or sites that pair via the complementary bases cytosine and guanine (C and G), on the one hand, and adenine and thymine (A and T), on the other. With whole genome sequencing, most genomic information stored in the DNA has become available for multiple individuals of one or more populations, at least in humans and model species, such as fruit flies of the genus Drosophila . In a genome-wide sample of L sites for M (haploid) individuals, the state of each site may be made binary, by binning the complementary bases, e.g., C with G to C/G, and contrasting C/G to A/T, to obtain a “site frequency spectrum” (SFS). Two such samples of either a single population from different time-points or two related populations from a single time-point are called joint site frequency spectra (joint SFS). While mathematical models describing the interplay of mutation, drift and selection have been available for more than 80 years, calculation of exact likelihoods from joint SFS is difficult. Sufficient statistics for inference of, e.g., mutation or selection parameters that would make use of all the information in the genomic data are rarely available. Hence, often suites of crude summary statistics are combined in simulation-based computational approaches. In this article, we use a bi-allelic boundary-mutation and drift population genetic model to compute the transition probabilities of joint SFS using orthogonal polynomials. This allows inference of population genetic parameters, such as the mutation rate (scaled by the population size) and the time separating the two samples. We apply this inference method to a population dataset of neutrally-evolving short intronic sites from six DNA sequences of the fruit fly Drosophila melanogaster and the reference sequence of the related species Drosophila sechellia .
机译:在群体遗传学中,通常从DNA序列信息中推断出有关进化力的信息,例如突变,选择和遗传漂移。通常,DNA由两条长链核苷酸或位点组成,长链或位点一方面通过互补碱基胞嘧啶和鸟嘌呤(C和G)配对,另一方面通过腺嘌呤和胸腺嘧啶(A和T)配对。通过全基因组测序,DNA中存储的大多数基因组信息已可用于一个或多个种群的多个个体,至少在人类和模型物种中,例如果蝇属的果蝇。在M个(单倍体)个体L个位点的全基因组样本中,每个位点的状态可以通过将互补碱基(例如C与G与C / G进行合并)以及C / G与A / T,以获得“现场频谱”(SFS)。来自不同时间点的单个种群的两个样本或来自单个时间点的两个相关种群的两个样本称为联合站点频谱(联合SFS)。尽管描述突变,漂移和选择之间相互作用的数学模型已有80多年的历史了,但很难计算出联合SFS的确切可能性。很少有足够的统计数据来推断例如将利用基因组数据中的所有信息的突变或选择参数。因此,通常在基于模拟的计算方法中结合了一些原始摘要统计信息。在本文中,我们使用双等位基因边界突变和漂移种群遗传模型,使用正交多项式计算联合SFS的转移概率。这样就可以推断出种群遗传参数,例如突变率(由种群规模定标)和分离两个样本的时间。我们将此推论应用于果蝇果蝇的六个DNA序列和相关物种果蝇的参考序列的中性进化短内含子位点的种群数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号