Suffix Arrays on Words

机译：单词的后缀数组

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the κ positions at word-boundaries of a text T[1,n], taking O(n) time and O(κ) space in addition to T. We propose a class-note solution to this problem that achieves such optimal time and space bounds. Word-based versions of indexes achieving the same time/space bounds were already known for suffix trees and (compact) DAWGs . Our solution inherits the simplicity and efficiency of suffix arrays, with respect to such other word-indexes, and thus it foresees applications in word-based approaches to data compression and computational linguistics. To support this, we have run a large set of experiments showing that word-based suffix arrays may be constructed twice as fast as their full-text counterparts, and with a working space as low as 20%. The space reduction of the final word-based suffix array impacts also in their query time (i.e. less random access binary-search steps!), being faster by a factor of up to 3.

机译：出乎意料的是，还不知道如何直接构建一个后缀数组，该后缀数组仅索引文本T [1，n]的单词边界上的κ位置，另外还要占用O（n）时间和O（κ）空间T.我们为该问题提出了一个类注释解决方案，该解决方案可实现最佳的时间和空间范围。后缀树和（紧凑的）DAWG已经知道实现相同时间/空间范围的基于单词的索引版本。我们的解决方案继承了后缀数组相对于其他单词索引的简单性和效率，因此可以预见在基于单词的数据压缩和计算语言学方法中的应用。为了支持这一点，我们进行了大量的实验，表明基于单词的后缀数组的构建速度是全文本后缀数组的两倍，并且工作空间低至20％。最终的基于单词的后缀数组的空间减少也影响了它们的查询时间（即，更少的随机访问二进制搜索步骤！），速度提高了近三倍。

著录项

来源
《Annual Symposium on Combinatorial Pattern Matching(CPM 2007); 20070709-11; London(CA)》|2007年|P.328-339|共12页
会议地点 London(CA)
作者
Paolo Ferragina; Johannes Fischer;
展开▼
作者单位

Dipartimento di Informatica, University of Pisa;

Institut fuer Informatik, Ludwig-Maximilians-Universitaet Muenchen;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Linear-time computation of minimal absent words using suffix array [J] . Carl Barton, Alice Heliou, Laurent Mouchard, BMC Bioinformatics . 2014,第1期

机译：使用后缀数组的线性时间计算最小缺席单词
2. Keyword-Driven Suffix Arrays for On-Line Keyword Searching from Documents In Chinese [J] . Yanhua Zhang, School of Software Engineering of University of Science, Technology of China International Journal of Artificial Intelligence & Applications (IJAIA) . 2012,第5期

机译：关键字驱动的后缀数组，用于从中文文档中进行在线关键字搜索
3. IN-PLACE UPDATE OF SUFFIX ARRAY WHILE RECODING WORDS [J] . MATTHIAS GALLE, PIERRE PETERLONGO, FRANCOIS COSTE International Journal of Foundations of Computer Science . 2009,第6期

机译：字词后缀数组的就地更新
4. Chinese Word Segmentation and Out-of-Vocabulary Words Detection Using Suffix Array [C] . Wenyan Ji, Tao Peng, Wanli Zuo, International Conference on Web Information Systems and Mining;WISM 2009 . 2009

机译：使用后缀数组的中文分词和词汇外词检测
5. Suffix trees and suffix arrays in primary and secondary storage [D] . Ko, Pang 2007

机译：主存储和辅助存储中的后缀树和后缀数组
6. Linear-time computation of minimal absent words using suffix array [O] . Carl Barton, Alice Heliou, Laurent Mouchard, 2014

机译：使用后缀数组的线性时间计算最小缺席单词
7. Linear-time Computation of Minimal Absent Words Using Suffix Array [O] . Carl Barton Alice Heliou 2016

机译：利用后缀数组计算最小缺席词的线性时间

Suffix Arrays on Words

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅