首页> 美国卫生研究院文献>Journal of Biological Research >A survey of methods and tools to detect recent and strong positive selection
【2h】

A survey of methods and tools to detect recent and strong positive selection

机译:调查发现最近的和强烈的正面选择的方法和工具的调查

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima’s D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.
机译:当等位基因受到自然选择的青睐时,就会发生阳性选择。有利的等位基因的频率在人群中增加,并且由于遗传搭便车,相邻的连锁变异减少,从而形成了所谓的选择性扫描。通过搜索选择性扫描引入的特征(例如变异减少的区域,位点频谱的特定位移以及该区域中的特定LD模式)来检测基因组中正选择的痕迹。可以使用多种方法和工具来检测扫描,从计算诸如Tajima D的摘要统计的简单实现,到使用统计,最大似然,机器学习等组合的更高级的统计方法。在本次调查中,我们介绍了讨论摘要统计信息和软件工具,并根据它们检测到的选择性扫描特征(即基于SFS和LD的特征)以及它们分析整个基因组或仅分析亚基因组区域的能力,对它们进行分类。此外,我们总结了四个开源软件版本(SweeD,SweepFinder,SweepFinder2和OmegaPlus)在敏感性,特异性和执行时间方面的比较结果。在平衡中性模型或温和瓶颈中,基于SFS和LD的方法都能够准确检测选择性扫描。在单次扫描或反复搭车的模型下,依赖LD的方法和工具的真实阳性率要高于基于SFS的方法和工具。但是,当使用错误指定的人口统计学模型表示无效假设时,其假阳性率会升高。当改用正确的(或类似于正确的)人口统计模型时,误报率会大大降低。在瓶颈情况下,检测到真正选择目标的准确性降低了。在执行时间方面,由于所需算术的性质,基于LD的方法通常比基于SFS的方法更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号