Textual databases for e.g. biological or web-data are growing rapidly, and it is often only feasible to store the data in compressed form. However, compressing the data comes at a price. Traditional algorithms for e.g. pattern matching requires all data to be decompressed - a computationally demanding task. In this thesis we design data structures for accessing and searching compressed data efficiently.Our results can be divided into two categories. In the first category we study problems related to pattern matching. In particular, we present new algorithms for counting and comparing substrings, and a new algorithm for finding all occurrences of a pattern in which we may insert gaps. In the other category we deal with accessing and decompressing parts of the compressed string. We show how to quickly access a single character of the compressed string, and present a data structure that supports fast decompression of substrings from prespecified positions.
展开▼