Collection join queries are join queries based on collection-type attributes (ie, sets, lists, arrays, bags). Our previous work identifies three categories of collection join queries. Conventional parallel join algorithms were designed for join queries based on atomic attributes, and are inadequate for collection join query processing. In this paper, we propose, a parallel join algorithm based on the hashing technique for each of the collection join query types. The main difference between the proposed parallel collection join algorithms and the conventional parallel hash join is that in the proposed algorithms, particularly the ones for collection-intersect and sub-collection join, data partitioning is based on non-disjoint partitioning. Another difference is that a new hashing technique for collection is also introduced.
展开▼