The suffix tree is a celebrated data structure in stringology and is used in providing efficient solutions for a plethora of problems. The main problem of suffix trees is their space usage: they may even require 20 bytes per text symbol! One solution to this issue is compressed suffix trees (CSTs). This paper proposes a new CST called grammar-compressed topology (GCT). GCT achieves low space on repetitive collections and much better times. In fact, GCT can be seen as a specialist for highly repetitive collections: the experiments of this paper show that on synthetic DNA collections with 99.9 percent similarity, GCT uses slightly higher space but runs up to three orders of magnitude faster.
展开▼