Rare category detecion (RCD) aims to discover rare cate-gories in a massive unlabeled data set with the help of a labeling oracle. A challenging task in RCD is to discover rare categories which are concealed by numerous data examples from major categories. Only a few algorithms have been proposed for this issue, most of which are on quadratic or cubic time complexity. In this paper, we propose a novel tree-based algorithm known as RCD-Forest with O(Φ n log (n/s)) time complexity and high query efficiency where n is the size of the unlabeled data set. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our method.
展开▼