Data partitioning is one of the physical data warehouse design techniques that accelerates OLAP queries and facilitates the warehouse manageability. To partition a relational warehouse, the best way consists in fragmenting dimension tables and then using their fragmentation schemas to partition the fact table. This type of fragmentation may dramatically increase the number of fragments of the fact table and makes their maintenance very costly. However, the search space for selecting an optimal fragmentation schema in the data warehouse context may be exponentially large. In this paper, the horizontal fragmentation selection problem is formalised as an optimisation problem with a maintenance constraint representing the number of fragments that the data warehouse administrator may manage. To deal with this problem, we present, SAGA, a hybrid method combining a genetic and a simulated annealing algorithms. We conduct several experimental studies using the APB-1 release II benchmark in order to validate our proposed algorithms.
展开▼