Finding large patterns is an objective of computational intelligence and a key step in many data mining applications, in particular in big data applications, where the scalability of mining algorithms is a great issue. This paper proposes an efficient algorithm Pampas that takes full advantage of the MapReduce framework in addressing the scalability issue. The novelty lies in two aspects: Pampas is the first parallel algorithm that integrates a breadth-first search strategy with a vertical mining approach, and Pampas proposes to employ different vertical formats in combination to represent the data, which improves not only scalability but also efficiency. Extensive experimental results demonstrate that the proposed algorithm outperforms the existing algorithms and scales out well with respect to database size and cluster size.
展开▼