首页> 外文期刊>BMC Bioinformatics >Linear space string correction algorithm using the Damerau-Levenshtein distance
【24h】

Linear space string correction algorithm using the Damerau-Levenshtein distance

机译:使用Damerau-Levenshtein距离的线性空间串校正算法

获取原文
       

摘要

The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have developed an O(mn) time O(mn) space algorithm to find the minimum cost edit sequence between strings of length m and n, respectively. In our previous research, we have developed algorithms that run in O(mn) time using only O(s?min{m,n}+m+n) space, where s is the size of the alphabet comprising the strings, to compute the DL distance as well as the corresponding edit sequence. These are so far the fastest and most space efficient algorithms. In this paper, we focus on the development of algorithms whose asymptotic space complexity is linear. We develop linear space algorithms to compute the Damerau-Levenshtein (DL) distance between two strings and determine the optimal trace (corresponding edit operations.)Extensive experiments conducted on three computational platforms–Xeon E5 2603, I7-x980 and Xeon E5 2695–show that, our algorithms, in addition to using less space, are much faster than earlier algorithms. Besides using less space than the previously known algorithms,significant run-time improvement was seen for our new algorithms on all three of our experimental platforms. On all platforms, our linear-space cache-efficient algorithms reduced run time by as much as 56.4% and 57.4% in respect to compute the DL distance and an optimal edit sequences compared to previous algorithms. Our multi-core algorithms reduced the run time by up to 59.3% compared to the best previously known multi-core algorithms.
机译:Damerau-Levenshtein(DL)距离度量已被广泛应用于生物科学。它试图通过使用取代,插入,删除和转置操作将一个序列转化为另一个序列来识别DNA,RNA和蛋白质序列的类似区域。 Lowrance和Wagner已经开发了一个O(MN)时间O(MN)空间算法,可以分别在长度M和N之间找到最小成本编辑序列。在我们以前的研究中,我们仅开发了仅使用O(s?min {m,n} + m + n)空间在O(MN)时间中运行的算法,其中S是包含字符串的字母表的大小,以计算DL距离以及相应的编辑序列。这是迄今为止最快,最空间的高效算法。在本文中,我们专注于算法的算法的发展,其渐近空间复杂性是线性的。我们开发线性空间算法以计算两个字符串之间的Damerau-Levenshtein(DL)距离,并确定在三个计算平台上进行的广泛实验 - Xeon E5 2603,I7-X980和Xeon E5 2695显示除了使用更少的空间之外,我们的算法比早期算法快得多。除了使用比以前已知的算法的空间更少,除了我们所有三个实验平台上的新算法,还可以看到显着的运行时间改进。在所有平台上,我们的线性空间高速缓存高效算法在与先前的算法相比,在计算DL距离和最佳编辑序列时,将运行时间减少多达56.4%和57.4%。与最佳先前已知的多核算法相比,我们的多核算法将运行时间减少至59.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号