首页> 外文期刊>Software >Fast construction of space-optimized recursive automaton
【24h】

Fast construction of space-optimized recursive automaton

机译:快速构建空间优化的递归自动机

获取原文
获取原文并翻译 | 示例
       

摘要

Finite-state automata are important components in information retrieval and natural language processing software. A recursive automaton is the most compact representation of the acyclic deterministic finite-state automata. It is based on merging not only the equivalent states but also the identical substructures in an automaton. The LZ trie variant is the state-of-the-art in automata compression regarding space, but the time needed for its construction was, until now, quadratic, which has made it impractical for large inputs. In this paper, we present the first algorithm for LZ trie construction that runs in effectively linear time thereby making it an attractive choice for finite-state automata implementation. We achieve this goal by adding a new functionality to the enhanced suffix array data structure. We present two variants of the construction procedure - an optimal method regarding the final size and a method that sacrifices some compression for low intermediate memory usage. We have made the implementation of our algorithms available in an open source software package LzLex.Copyright (c) 2014 John Wiley & Sons, Ltd.
机译:有限状态自动机是信息检索和自然语言处理软件中的重要组件。递归自动机是非循环确定性有限状态自动机的最紧凑表示。它不仅基于合并等效状态,而且还基于合并自动机中的相同子结构。 LZ trie变体是关于空间的自动机压缩的最新技术,但是到目前为止,其构造所需的时间是二次的,这使得在大型输入中不可行。在本文中,我们提出了第一种用于LZ trie构建的算法,该算法在有效的线性时间内运行,从而使其成为有限状态自动机实现的有吸引力的选择。我们通过向增强的后缀数组数据结构添加新功能来实现此目标。我们介绍了构造过程的两个变体-有关最终大小的最佳方法和为降低中间内存使用率而牺牲一些压缩的方法。我们已在开源软件包LzLex中提供了算法的实现。(c)2014 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号