...
首页> 外文期刊>BMC Bioinformatics >VarBin, a novel method for classifying true and false positive variants in NGS data
【24h】

VarBin, a novel method for classifying true and false positive variants in NGS data

机译:VarBin,一种用于对NGS数据中的真假阳性变异进行分类的新颖方法

获取原文
           

摘要

BackgroundVariant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction.MethodsVarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild typeon-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild typeon-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild typeon-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4).ResultsTo assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4.ConclusionsThese data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2.
机译:背景技术使用Illumina基因组或外显子组测序发现罕见遗传疾病的变种包括筛选多达数百万个变体,以仅发现一种或几种致病性变体。测序或比对错误会产生“假阳性”变异,通常会保留在变异筛选过程中。去除假阳性变异的方法通常会保留许多假阳性变异。本报告介绍了VarBin,一种基于假阳性变异可能性预测对变异进行优先级排序的方法。方法VarBin使用Genome Analysis Toolkit变异调用软件来计算每个变异变化和位置的变异对野生型基因型似然比除以读取深度。 。所得的Phred缩放深度似然比(PLRD)用于将变体分为4个Bin,其中Bin 1变体最可能为true,Bin 4最可能为假阳性。计算感兴趣的先证者和另外41个Illumina HiSeq,外显子组和全基因组样本(先证者的家族或无关样本)的PLRD值。在没有明显测序或比对错误的变异位点,野生型/非变异调用聚集在-3 PLRD附近,变异调用通常聚集在10 PLRD以上。存在系统性变体调用问题(通过变体质量得分和偏差以及在iGV查看器上显示)的站点往往具有越来越多的野生型/不变PLRD值。根据先证者的变异PLRD值与野生型/非变异PLRD值簇在背景上的相同变异体变化和位置的分离,将VarBin方法的分类分配给每个先证者变异体(Bin 1至Bin 4)结果为了评估VarBin的性能,对先证者和背景样本中的98个变异体进行了Sanger测序。在97%的Bin 1变种,30%的Bin 2和0%的Bin 3 / Bin 4中确认了真实变种。结论这些数据表明VarBin正确地将大多数真实变种分类为Bin 1和Bin 3/4仅包含假阳性变体。 “不确定”容器2包含真假肯定变量。未来的工作将进一步区分Bin 2中的变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号