...
首页> 外文期刊>BMC Bioinformatics >Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate
【24h】

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate

机译:通过控制错误发现率来增强高通量DNA测序数据中条形码阅读的检测

获取原文
           

摘要

Background DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives. For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. Results In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Conclusion Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.
机译:背景DNA条形码是短短的独特序列,用于在多重深度测序实验中标记DNA或RNA衍生的样品。在多路分解步骤中,必须检测条形码并确定其位置。在某些情况下(例如,使用PacBio SMRT),条形码和DNA上下文的位置没有很好地定义。许多读取起始于基因组插入片段内部,因此可能会漏掉相邻的引物。条形码序列与参考DNA之间的巧合相似性使问题进一步复杂化。因此,需要一种鲁棒的策略来检测条形码读取并避免大量的假阳性或阴性。对于诸如此类的大量推理问题,错误发现率(FDR)方法是强大且平衡的解决方案。由于现有的FDR方法无法应用于此特定问题,因此我们提出了一种适用于检测条形码读取的FDR方法,并提出了可能的改进方法。结果在我们的分析中,条形码序列与小家鼠参考DNA的巧合率很高。当条形码序列的长度减少而集合中条形码的数量增加时,此问题变得更加严重。本文提出的方法控制基于尾部区域的错误发现率,以区分条形码读取和未条形码读取。此方法有助于在读取和条形码序列之间建立最大可接受的最小距离。在概念验证实验中,我们以83%的精度正确检测了83%读取的条形码。当相邻的引物序列纳入分析时,灵敏度以99%的精度提高到99%。使用配对末端策略进一步改善了分析。在对C57BL / 6小鼠黑素细胞的Atp1a1基因通过紫外线诱导的序列变异的数据进行分析并赋予对哇巴因的抗性之后,我们没有发现样品之间DNA物质交叉污染的证据。结论我们的方法为在嘈杂的测序环境中检测条形码读取的问题提供了适当的定量处理。它基于错误发现率统计信息,可以在灵敏度和精度之间进行适当的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号