Systems and techniques by which tables can be joined in a mapreduce procedure. In some implementations, when a large table of business data (e.g., having one billion transaction records or more) is to be joined with a large table of customer data (e.g., having hundreds of millions of customer records), then these two tables can be organized before the mapreduce procedure to speed up the table join. For example, the business data and the customer data can both be hash partitioned, based on the same key, into shards of business data and shards of customer data, respectively. The number of shards in these two groups has an integer relationship with each other: for example such that there are two business data shards for every customer data shard, or vice versa.
展开▼