首页> 外文期刊>Genetics, selection, evolution >Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
【24h】

Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing

机译:索引跳跃和偏向参考等位基因对低覆盖率测序对基因型调用准确性的影响

获取原文
           

摘要

Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data.
机译:影响序列数据质量的固有错误和偏倚源包括跳频和偏向参考等位基因。这些伪像对低覆盖率数据的影响可能大于对高覆盖率数据的影响,因为低覆盖率数据的信息很少,并且许多用于处理序列数据的标准工具都是为高覆盖率数据设计的。随着经济高效的低覆盖率测序的发展,有必要了解这些错误和偏见对低覆盖率测序产生的基因型调用的影响。我们使用了26头猪的数据集,这些猪在2x的情况下进行了多路复用,在30x的情况下未进行多路复用,以显示由于比对导致的跳频和偏向参考等位基因对基因型调用的影响很小。但是,修剪一些低于预定阈值的读数所支持的替代单倍型,这是某些变异体调用者用于消除高覆盖率数据中潜在测序错误的默认步骤和期望步骤,当将其应用于参考等位基因时会产生意想不到的偏差低覆盖率序列数据。这种偏倚使低覆盖率序列数据的最佳猜测基因型一致性降低了19.0个绝对百分点。我们提出了一条简单的流程来纠正在变异发现过程中可能发生的对参考等位基因的偏倚,并且我们建议低覆盖率序列数据的用户警惕为高覆盖率而设计的生物信息学工具可能产生的意外偏向序列数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号