The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.
展开▼