In this paper, we propose a new method for indexing large amounts of point and spatial data in high-dimensional space. An analysis shows that index-structures such as the R-tree are not adequate for indexing high-dimensional data sets. The major problem of R-tree-based index structures is the overlap of the bounding boxes in the directory, which increases with growing dimension. To avoid this problem, we introduce a new organization of the directory which uses a split algorithm minimizing overlap and additionally utilizes the concept of supernodes. The basic idea of overlap-minimizing split and supernodes is to keep the directory as hierarchical as possible, and at the same time to avoid splits in the directory that would result in high overlap. Our experiments show that for high-dimensional data, the X-tree outperforms the well-known R-tree and the TV-tree by up to two orders of magnitude.
展开▼