Off-line dictionary compression is becoming more attractive for applications where compressed data are searched directly in compressed form. While there has been large body of related work describing specific database compression algorithms, the Hibase architecture is unique in processing queries in compressed data. However, this technique does not compress the representation of strings in the domain dictionaries. Primary keys, data with high cardinality and semi-structured data contribute very little or no compression. To achieve high performance irrespective of type of data, the string representation must be in compressed form. At the same time, the direct addressability of compressed data is maintained. Serial compression techniques cannot be used. In this paper, we present a prefix dictionary-based off-line method that can be incorporated with systems like Hibase where compressed data can be accessed directly without prior decompression. The complexity is O(n) in time and space.
展开▼