首页> 外文期刊>Bioinformatics >OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.
【24h】

OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.

机译:OnlineCall:快速的在线参数估计和基础调用,用于照明的下一代测序。

获取原文
获取原文并翻译 | 示例
           

摘要

MOTIVATION: Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections. RESULTS: Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illumina's sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation-maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)-a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illumina's base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency. AVAILABILITY: A C code implementation of our algorithm can be downloaded from http://www.cerc.utexas.edu/OnlineCall/ CONTACT: hvikalo@ece.utexas.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
机译:动机:下一代DNA测序平台正变得越来越具有成本效益,并能够在相对较短的时间内提供大量读取。但是,它们的准确性和读取长度仍落后于传统的Sanger测序方法。下一代测序平台的性能从根本上受到合成测序和信号采集过程中各种缺陷的限制。这推动了对能够解决这种缺陷的准确,可扩展和计算易处理的基本调用算法的搜索。结果:依靠合成测序和信号采集程序的统计模型,我们为Illumina的测序技术(特别是Genome Analyzer II平台)开发了一种计算有效的碱基检出方法。该模型的参数是通过一种快速的无监督在线学习方案进行估计的,该方案使用广义的期望最大化算法,每个图块仅需要3 s的运行时间(在Intel i7机器@ 3.07GHz,单核)-三个数量级比现有的基于参数模型的方法提高幅度。为了最大程度地减少测序运行结束与生成碱基检出报告之间的等待时间,我们开发了一种快速的在线可扩展解码算法,该算法仅需要9 s / tile,并且比Illumina碱基检出软件的错误率要低得多。而且,证明了所提出的在线参数估计方案有效地计算了与图块相关的参数,其随后可以被提供给基本调用算法,从而在性能,时间方面,相对于先前开发的用于所考虑的平台的基本调用方法有了显着的改进。 /复杂度和延迟。可用性:可以从http://www.cerc.utexas.edu/OnlineCall/下载我们算法的C代码实现。联系人:hvikalo@ece.utexas.edu补充信息:补充数据可在在线生物信息学中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号