首页> 外文期刊>Bioinformatics >Improved gap size estimation for scaffolding algorithms
【24h】

Improved gap size estimation for scaffolding algorithms

机译:改进的脚手架算法的缺口大小估计

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance. Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.
机译:动机:基因组组装的重要步骤之一是脚手架,其中重叠群使用阅读对中的信息进行连接。脚手架提供有关重叠群之间的顺序,相对方向和距离的估计。我们发现重叠群距离估计通常存在很大偏见,并且基于错误的假设。由于错误的距离估计可能会误导后续分析,因此重要的是要对重叠群距离进行无偏估计。结果:在本文中,我们显示了最新的脚手架程序正在使用错误的间隙尺寸估算模型。我们讨论了为什么当前的最大似然估计有偏见,并描述了我们所面临的不同情况。此外,我们提供了跨越一个缺口的读取分布的模型,并得出了缺口长度的最大似然方程。我们激发出为什么这种估计是正确的,并凭经验表明它在流行的脚手架计划中胜过差距估计。我们的结果对脚手架软件,结构变化检测以及由读对齐器通常执行的文库插入物大小估计均具有影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号