A typical concept-detection problem is characterised by greatly disproportionate sizes of the populations of training samples in the concept and anti-concept classes. In many cases, the population of anti-concept (negative) examples outnumber the concept examples. In this paper, an inverse random under sampling method is proposed to solve this imbalance problem. By the proposed method of inverse under sampling of the anti-concept class we can construct a large number of concept detectors which in the fusion stage facilitate a fine control of both false negative rates and false positive rates. In this method the main emphasis in learning the discriminant functions is on the concept class, leading to an almost perfect separation of the two classes for each detector. The proposed methodology is applied to commonly-used video and image collection benchmarks: Mediamill and Scene datasets. The results indicate significant performance gains. For some concepts, the improvement in the average precision is by several orders of magnitude, and the mean average precision is 12% and 17% better for Mediamill and Scene datasets respectively when compared with conventionally trained logistic regression classifier.
展开▼