首页> 美国卫生研究院文献>Virus Evolution >The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods: A simulation study
【2h】

The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods: A simulation study

机译:使用贝叶斯天际线族合并方法的抽样策略对病毒种群动态重建质量的影响:模拟研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The ongoing large-scale increase in the total amount of genetic data for viruses and other pathogens has led to a situation in which it is often not possible to include every available sequence in a phylogenetic analysis and expect the procedure to complete in reasonable computational time. This raises questions about how a set of sequences should be selected for analysis, particularly if the data are used to infer more than just the phylogenetic tree itself. The design of sampling strategies for molecular epidemiology has been a neglected field of research. This article describes a large-scale simulation exercise that was undertaken to select an appropriate strategy when using the GMRF skygrid, one of the Bayesian skyline family of coalescent methods, in order to reconstruct past population dynamics. The simulated scenarios were intended to represent sampling for the population of an endemic virus across multiple geographical locations. Large phylogenies were simulated under a coalescent or structured coalescent model and sequences simulated from these trees; the resulting datasets were then downsampled for analyses according to a variety of schemes. Variation in results between different replicates of the same scheme was not insignificant, and as a result, we recommend that where possible analyses are repeated with different datasets in order to establish that elements of a reconstruction are not simply the result of the particular set of samples selected. We show that an individual stochastic choice of sequences can introduce spurious behaviour in the median line of the skygrid plot and that even marginal likelihood estimation can suggest complicated dynamics that were not in fact present. We recommend that the median line should not be used to infer historical events on its own. Sampling sequences with uniform probability with respect to both time and spatial location (deme) never performed worse than sampling with probability proportional to the effective population size at that time and in that location and frequently was superior. As a result, we recommend this approach in the design of future studies. We also confirm that the inclusion of many recent sequences from a single geographical location in an analysis tends to result in a spurious bottleneck effect in the reconstruction and caution against interpreting this as genuine.
机译:病毒和其他病原体的遗传数据总量的不断大规模增加导致一种情况,在这种情况下,通常不可能在系统发育分析中包括每个可用序列,并期望该过程在合理的计算时间内完成。这引发了有关应如何选择一组序列进行分析的问题,特别是如果使用数据推断的不仅仅是系统发育树本身时,尤其如此。分子流行病学的抽样策略设计一直是研究的一个被忽视的领域。本文介绍了大规模模拟演习,该演习是在使用GMRF Skygrid(贝叶斯天际线合并方法之一)来选择适当策略时进行的,以重建过去的人口动态。模拟方案旨在代表跨多个地理位置的地方性病毒种群的抽样。在合并或结构化合并模型下模拟大型系统发育,并从这些树中模拟序列。然后根据各种方案对所得数据集进行下采样以进行分析。同一方案的不同重复样本之间的结果差异并不明显,因此,我们建议在可能的情况下,对不同的数据集进行重复分析,以便确定重构的元素不仅是特定样本集的结果已选择。我们表明,序列的单个随机选择可能会在天际线图的中线引入虚假行为,甚至边缘可能性估计也可能会提示实际上并不存在的复杂动态。我们建议不要使用中线单独推断历史事件。相对于时间和空间位置(行为)具有统一概率的采样序列的性能从来没有比当时和在那个位置的有效种群大小成比例的概率采样采样效果更好,并且通常是更好的。因此,我们建议在将来的研究设计中采用这种方法。我们还确认,在分析中包含来自同一地理位置的许多最新序列往往会导致重建过程中出现虚假的瓶颈效应,并请注意不要将其解释为真实的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号