Linear space string correction algorithm using the Damerau-Levenshtein distance

Chunchun Zhao; Sartaj Sahni

首页> 外文期刊>BMC Bioinformatics >Linear space string correction algorithm using the Damerau-Levenshtein distance

【24h】

Linear space string correction algorithm using the Damerau-Levenshtein distance

机译：使用Damerau-Levenshtein距离的线性空间串校正算法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have developed an O(mn) time O(mn) space algorithm to find the minimum cost edit sequence between strings of length m and n, respectively. In our previous research, we have developed algorithms that run in O(mn) time using only O(s?min{m,n}+m+n) space, where s is the size of the alphabet comprising the strings, to compute the DL distance as well as the corresponding edit sequence. These are so far the fastest and most space efficient algorithms. In this paper, we focus on the development of algorithms whose asymptotic space complexity is linear. We develop linear space algorithms to compute the Damerau-Levenshtein (DL) distance between two strings and determine the optimal trace (corresponding edit operations.)Extensive experiments conducted on three computational platforms–Xeon E5 2603, I7-x980 and Xeon E5 2695–show that, our algorithms, in addition to using less space, are much faster than earlier algorithms. Besides using less space than the previously known algorithms,significant run-time improvement was seen for our new algorithms on all three of our experimental platforms. On all platforms, our linear-space cache-efficient algorithms reduced run time by as much as 56.4% and 57.4% in respect to compute the DL distance and an optimal edit sequences compared to previous algorithms. Our multi-core algorithms reduced the run time by up to 59.3% compared to the best previously known multi-core algorithms.

机译：Damerau-Levenshtein（DL）距离度量已被广泛应用于生物科学。它试图通过使用取代，插入，删除和转置操作将一个序列转化为另一个序列来识别DNA，RNA和蛋白质序列的类似区域。 Lowrance和Wagner已经开发了一个O（MN）时间O（MN）空间算法，可以分别在长度M和N之间找到最小成本编辑序列。在我们以前的研究中，我们仅开发了仅使用O（s？min {m，n} + m + n）空间在O（MN）时间中运行的算法，其中S是包含字符串的字母表的大小，以计算DL距离以及相应的编辑序列。这是迄今为止最快，最空间的高效算法。在本文中，我们专注于算法的算法的发展，其渐近空间复杂性是线性的。我们开发线性空间算法以计算两个字符串之间的Damerau-Levenshtein（DL）距离，并确定在三个计算平台上进行的广泛实验 - Xeon E5 2603，I7-X980和Xeon E5 2695显示除了使用更少的空间之外，我们的算法比早期算法快得多。除了使用比以前已知的算法的空间更少，除了我们所有三个实验平台上的新算法，还可以看到显着的运行时间改进。在所有平台上，我们的线性空间高速缓存高效算法在与先前的算法相比，在计算DL距离和最佳编辑序列时，将运行时间减少多达56.4％和57.4％。与最佳先前已知的多核算法相比，我们的多核算法将运行时间减少至59.3％。

著录项

来源
《BMC Bioinformatics》 |2020年第1期|共21页
作者
Chunchun Zhao; Sartaj Sahni;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词
Edit distanceDamerau-Levenshtein distanceLinear spaceString correction;

机译：编辑DistanceMerau-Levenshtein Distancelinear Spacestring校正;

相似文献

外文文献
中文文献
专利

1. String correction using the Damerau-Levenshtein distance [J] . Chunchun Zhao, Sartaj Sahni BMC Bioinformatics . 2019,第S11期

机译：使用Damerau-Levenshtein距离的字符串校正
2. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：更正为：在汉明距离下用于定长近似字符串匹配和近似圆形字符串匹配的新算法
3. Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences [J] . Satoshi Kobayashi, Diptarama Hendrian, Ryo Yoshinaka, LIPIcs : Leibniz International Proceedings in Informatics . 2020,第30期

机译：基于Q-GRAM出现距离的快速和线性时串匹配算法
4. Linear Space String Correction Algorithm Using The Damerau-Levenshtein Distance [C] . Chunchun Zhao, Sartaj Sahni IEEE International Conference on Computational Advances in Bio and Medical Sciences . 2018

机译：使用Dameau-Levenshtein距离的线性空间字符串校正算法
5. All-digital Calibration Algorithms for the Correction of Static Non-linearities in ADCs [D] . Chen, Paul Wenbo. 2020

机译：ALLIC静态非线性校正的全数字校准算法
6. Linear space string correction algorithm using the Damerau-Levenshtein distance [O] . Chunchun Zhao, Sartaj Sahni 2020

机译：使用Damerau-Levenshtein距离的线性空间串校正算法
7. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [O] . ThienLuan Ho, Seung-Rohk Oh, HyunJin Kim 2018

机译：校正：用于在汉明距离下匹配的固定长度近似串匹配和近似圆形串的新算法

Linear space string correction algorithm using the Damerau-Levenshtein distance

摘要

著录项

相似文献

相关主题

期刊订阅