【24h】

Suffix Arrays on Words

机译:单词的后缀数组

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the κ positions at word-boundaries of a text T[1,n], taking O(n) time and O(κ) space in addition to T. We propose a class-note solution to this problem that achieves such optimal time and space bounds. Word-based versions of indexes achieving the same time/space bounds were already known for suffix trees and (compact) DAWGs . Our solution inherits the simplicity and efficiency of suffix arrays, with respect to such other word-indexes, and thus it foresees applications in word-based approaches to data compression and computational linguistics. To support this, we have run a large set of experiments showing that word-based suffix arrays may be constructed twice as fast as their full-text counterparts, and with a working space as low as 20%. The space reduction of the final word-based suffix array impacts also in their query time (i.e. less random access binary-search steps!), being faster by a factor of up to 3.
机译:出乎意料的是,还不知道如何直接构建一个后缀数组,该后缀数组仅索引文本T [1,n]的单词边界上的κ位置,另外还要占用O(n)时间和O(κ)空间T.我们为该问题提出了一个类注释解决方案,该解决方案可实现最佳的时间和空间范围。后缀树和(紧凑的)DAWG已经知道实现相同时间/空间范围的基于单词的索引版本。我们的解决方案继承了后缀数组相对于其他单词索引的简单性和效率,因此可以预见在基于单词的数据压缩和计算语言学方法中的应用。为了支持这一点,我们进行了大量的实验,表明基于单词的后缀数组的构建速度是全文本后缀数组的两倍,并且工作空间低至20%。最终的基于单词的后缀数组的空间减少也影响了它们的查询时间(即,更少的随机访问二进制搜索步骤!),速度提高了近三倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号