首页> 外文期刊>Algorithmica >A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings
【24h】

A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

机译:一种用于计算游程编码字符串的编辑距离的完全压缩算法

获取原文
获取原文并翻译 | 示例

摘要

A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mn logmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, m ≤ n, we present an O(mn~2)-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn~2) time.
机译:字符串学的最新趋势探索了利用文本压缩来加快字符串之间相似度计算的可能性。在这方面的研究中,行程编码是研究最早的压缩方案之一。尽管它具有简单的编码性质,但在此工作之前唯一的积极结果是In-del距离(最长公共子序列的对偶)的计算,这需要O(mn logmn)时间,其中m和n表示输入字符串。计算两个游程长度编码的字符串之间的编辑距离的最坏情况下的时间复杂度仍然取决于未压缩的字符串长度。在本文中,我们通过提供其第一个“完全压缩”算法来打破基本差距,该算法的运行时间仅取决于压缩后的字符串长度。具体来说,给定两个压缩为m和n的字符串,m≤n,我们提出了O(mn〜2)时间算法来计算字符串的编辑距离。我们的方法还产生了第一个完全压缩的解决方案,以近似匹配O(mn〜2)时间中n个运行的文本中m个运行的模式。

著录项

  • 来源
    《Algorithmica》 |2013年第2期|354-370|共17页
  • 作者

    Kuan-Yu Chen; Kun-Mao Chao;

  • 作者单位

    Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan;

    Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    compressed pattern matching; edit distance; run length;

    机译:压缩模式匹配;编辑距离;游程长度;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号