...
首页> 外文期刊>BMC Bioinformatics >zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
【24h】

zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm

机译:zipHMMlib:高度优化的HMM库,利用输入中的重复来加快转发算法

获取原文
           

摘要

Background Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. Results We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models. Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. Conclusions We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/ webcite .
机译:背景技术隐马尔可夫模型被广泛用于基因组分析,因为它们结合了建模的简便性和高效的分析算法。使用前向算法计算模型的可能性在最坏情况下的时间复杂度在序列的长度上是线性的,而在模型中的状态数上是二次的。但是,对于基因组分析,其长度达到数百万或数十亿个观测值,并且在使可能性最大化时,通常需要数百次评估。因此,高效的前向算法是高效的隐马尔可夫模型库中的关键要素。结果我们建立了一个软件库,可以有效地计算隐马尔可夫模型的可能性。该库利用输入中常见的子字符串来重用正向算法中的计算。在预处理步骤中,我们的库将识别常见的子字符串,并在正向算法的计算基础上构建一个可重复使用的结构。该分析可以在使用该库之间保存,并且与具体的隐式马尔可夫模型无关,因此可以使用一个预处理来运行许多不同的模型。使用此库,我们可以利用真实且相当复杂的隐藏马尔可夫模型,将壁钟时间缩短多达78倍,以进行现实的全基因组分析。在一种特定情况下,分析是在不到8分钟的时间内完成的,而以前最快的库是9.6小时。结论我们已经将预处理过程和转发算法实现为C ++库zipHMM,并带有用于脚本的Python绑定。该库位于http://birc.au.dk/software/ziphmm/ webcite。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号