Subgroup Discovery is a flexible supervised local pattern mining method whose aim is to discover interesting subgroups with respect to one property of interest. Although many efficient algorithms have been developed in this field, the growing interest in data storage has provoked that the datasets are larger and larger hampering their performance. In this paper, two new algorithms to discover subgroups on Big Data have been proposed. In this regard, the MapReduce paradigm has been considered and in concrete Apache Spark was used to face up the Big Data requirements. The experimental study considers more than 40 high dimensional datasets and a set of efficient algorithms on the subgroup discovery field. Search spaces bigger than 3.3·10~(13) available subgroups are used. The experimental analysis demonstrates that the proposed algorithms obtain excellent results in efficiency, demonstrating the usefulness of using Apache Spark in the field.
展开▼