...
首页> 外文期刊>Infection, Genetics and Evolution: Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases >An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
【24h】

An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data

机译:优化的基因组VCF工作流程,用于精确鉴定跨平台全基因组测序数据的结核分枝杆菌聚集体

获取原文
获取原文并翻译 | 示例
           

摘要

Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of <= 12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences.
机译:全基因组测序(WGS)数据允许通过使用<= 12个单核苷酸多态性(SNP)的成对遗传距离作为阈值的成对遗传距离来推理分枝杆菌(MTB)簇。然而,当与不同的下一代测序(NGS)平台组合使用WGS数据时,SNP和遗传距离测量数量差异的问题是很好的。我们对来自Illumina / NextSeq500的3种多种耐药性(MDR-TB),3个广泛的耐药结核(XDR-TB)和标准M.结核病菌株H37RV和离子血管PGM的标准M.结核病菌株H37RV进行了呼吁呼吁的SNP变体。使用四种不同的常见变体调用工具获得变体呼叫,包括基因组分析工具包(GATK)单舱(GATK-VCF工作流程),GATK HaplotypeCaller和GenotyPegVCFS(Gatk-GVCF工作流程),SAMTools和Varscan 2.跨平台成对SNP差异分析了最小跨越网络和平均核苷酸同一性(ANI)以测量变体呼叫工具的性能。使用GVCF工作流程时,最小成对SNP差异范围为2到14个SNP,而使用Varscan的最大成对SNP差距为7至158个SNP 2. ANI在Nextseq500和MDR-TB的PGM和XDR-TB的PGM之间的比较显示最大ANI分别为99.7%和99.0%,具有GVCF工作流程,而其他SNP呼叫结果显示出较低的ANI,范围为98.6%至95.1%。在这项研究中,我们建议GVCF工作流程显示出最佳的变体呼叫者,以避免跨平台的成对SNP差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号