首页> 外文期刊>Bioinformatics >On genomic repeats and reproducibility
【24h】

On genomic repeats and reproducibility

机译:关于基因组重复和可重复性

获取原文
获取原文并翻译 | 示例
       

摘要

Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping-without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results.
机译:结果:在这里,我们使用高通量测序数据对基因组变异的计算表征的可重复性进行了全面分析。我们使用相同的工具和相同的参数对相同的数据集进行了两次重新分析,其中我们仅更改了输入(即FASTQ文件)中的读取顺序。改组导致重复区域中的读取被映射到第二个比对中的不同位置,并且当我们仅对读映射应用散射/聚集方法而没有事先改组时,我们观察到了相似的结果。我们的结果表明,当选择随机位置时,某些最常见的变异发现算法无法准确处理歧义读取映射。此外,我们还观察到,即使使用完全相同的比对,GATK HaplotypeCaller也会生成略有不同的调用集,我们将其精确定位到变量过滤步骤。我们得出的结论是,基因组变异发现和表征的每个步骤中的算法都需要以确定性方式处理模棱两可的映射,以确保结果的完全复制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号