A distributed system architecture, data distribution method, query partition and execution method for data warehousing distributed systems comprised of a cluster of independent nodes connected through a network. Queries are generated by data accessing and data analysis applications executing in client computers. The method enables the distribution of data through all physical servers (nodes) according to a uniform distribution, assuring an optimal load balance in which all nodes have approximately the same amount of information and each query requires approximately the same amount of data to be processed in each node. The method enables each query to be re-written in a manner that is executed in parallel by all nodes, each node with an equivalent amount of data within its local data to process providing a near linear speed-up and scale-up.
展开▼