We consider the enumeration of maximal bipartite cliques (bicliques) from alarge graph, a task central to many practical data mining problems in socialnetwork analysis and bioinformatics. We present novel parallel algorithms forthe MapReduce platform, and an experimental evaluation using Hadoop MapReduce.Our algorithm is based on clustering the input graph into smaller sizedsubgraphs, followed by processing different subgraphs in parallel. Ouralgorithm uses two ideas that enable it to scale to large graphs: (1) theredundancy in work between different subgraph explorations is minimized througha careful pruning of the search space, and (2) the load on different reducersis balanced through the use of an appropriate total order among the vertices.Our evaluation shows that the algorithm scales to large graphs with millions ofedges and tens of mil- lions of maximal bicliques. To our knowledge, this isthe first work on maximal biclique enumeration for graphs of this scale.
展开▼