首页> 美国卫生研究院文献>BMC Bioinformatics >A statistical framework for detecting mislabeled and contaminated samples using shallow-depth sequence data
【2h】

A statistical framework for detecting mislabeled and contaminated samples using shallow-depth sequence data

机译:一个使用浅深度序列数据检测错误标记和污染样品的统计框架

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundResearchers typically sequence a given individual multiple times, either re-sequencing the same DNA sample (technical replication) or sequencing different DNA samples collected on the same individual (biological replication) or both. Before merging the data from these replicate sequence runs, it is important to verify that no errors, such as DNA contamination or mix-ups, occurred during the data collection pipeline. Methods to detect such errors exist but are often ad hoc, cannot handle missing data and several require phased data. Because they require some combination of genotype calling, imputation, and haplotype phasing, these methods are unsuitable for error detection in low- to moderate-depth sequence data where such tasks are difficult to perform accurately. Additionally, because most existing methods employ a pairwise-comparison approach for error detection rather than joint analysis of the putative replicates, results may be difficult to interpret.
机译:背景研究人员通常对给定的个体进行多次测序,或者对同一DNA样本重新测序(技术复制),或者对在同一个体上收集的不同DNA样本进行测序(生物复制),或者对两者进行测序。在合并来自这些复制序列运行的数据之前,重要的是要验证在数据收集管道中没有发生任何错误,例如DNA污染或混淆。存在检测此类错误的方法,但这些方法通常是临时性的,无法处理丢失的数据,并且有些方法需要分阶段的数据。由于它们需要基因型调用,插补和单倍型定相的某种组合,因此这些方法不适用于难以准确执行此类任务的中低深度序列数据中的错误检测。此外,由于大多数现有方法采用成对比较方法进行错误检测,而不是对假定的重复项进行联合分析,因此结果可能难以解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号