This paper discusses efficient hash-partitioning using workload access patterns to place and process relations in a cluster or distributed query-intensive database environment. In such an environment, there is usually more than one partitioning alternative for each relation. We discuss a method and algorithm to determine the hash partitioning attributes and placement. Among the alternatives, our algorithm chooses a placement that reduces repartitioning overheads using expected or historical query workloads. The paper includes a simulation study showing how our strategy outperforms ad-hoc placement and previously proposed distributed database strategies.
展开▼