B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally
when keys are long or of variable length,
when keys are compressed, even when using front compression, the standard B-tree compression scheme,
for range queries, and
with respect to memory effects such as disk prefetching.
This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways:
The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.
It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.
It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.
It utilizes all levels of a memoryhierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.