首页> 美国卫生研究院文献>Genome Biology >Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads
【2h】

Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads

机译:基于图和基于对齐的混合纠错方法在易错长读中的性能差异

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Illustration of alignment-based and graph-based method; results for model fitness and accuracy gain on simulated data. Schematic of alignment-based method. is a certain base on the long read, and is the corresponding base on the reference sequence. The real short reads are aligned to the long read (with of them being successfully aligned), and then the consensus is inferred at each base. Relationship of the successful alignment probability for short reads with the mismatch rate , lower threshold on perfect match -mer size and the upper threshold of mismatches . In spite of the changes of or/and , is near to one when  p > 30%. This indicates that mismatch rate is the most dominant factor on . As increases from 10 to 20, the curves move upper (from blue to red and green), implying that increases with . Moreover, the divergence between the dashed and solid blue, red, and green lines also shows an increasing tendency, which means the effect of on also increases with . Schematic of graph-based error correction method. DBG is built based on short reads. Solid -mers are detected on the long reads. The fragment between two adjacent solid -mers is then aligned with the correlated path on the DBG. The path is used to correct the fragment when certain criteria are satisfied. Accuracy gain at each error rate for simulated long reads corrected by alignment-based method. The boxplots represent the accuracy gain distribution for long reads. The solid lines represent the theoretical values. The dashed gray lines (diagonal lines) correspond to perfect correction. Proportion of simulated long reads with solid -mer detected at each error rate level. The solid lines represent the theoretical values. The dashed lines represent the results on simulated long reads. Accuracy gain at each error rate for simulated long reads corrected by graph-based method. : long read length; : size of perfectly matched seed or solid -mer
机译:基于对齐和基于图的方法的说明;模拟数据的模型适应性和准确性增益的结果。基于对齐方式的方法示意图。是基于长期阅读的确定基础,是与参考序列相对应的基础。真正的短读与长读对齐(它们已成功对齐),然后在每个碱基处推断出共识。短阅读的成功比对概率与错配率,完美匹配分子大小的下限和错配上限的关系。尽管或/和变化,当p> 30%时仍接近1。这表明不匹配率是上最主要的因素。当从10增加到20时,曲线向上移动(从蓝色到红色和绿色),这意味着随增大。此外,蓝色,红色和绿色虚线与实线之间的散度也显示出增加的趋势,这意味着的影响也随增大。基于图的纠错方法的示意图。 DBG基于短读而构建。在长读数中检测到固态单体。然后将两个相邻的固态单体之间的片段与DBG上的相关路径对齐。当满足某些条件时,该路径用于更正片段。通过基于比对的方法校正的模拟长读在每种错误率下的准确度增益。箱线图表示长时间读取的精度增益分布。实线代表理论值。灰色虚线(对角线)对应于完美校正。在每个错误率水平上检测到的带有固态单体的模拟长读的比例。实线代表理论值。虚线表示模拟长读的结果。通过基于图的方法校正的模拟长读在每个错误率处的准确度增益。 :读取时间长; :完美匹配的种子或固体的大小

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号