In online advertising, an important quality control step is to audit advertising images ("creatives") before they appear on publishers' webpages. This ensures that advertisements only appear on webpages where the ad is appropriate. Assigning the correct sensitive categories to each creative - such as alcohol, tobacco, etc. - is one of the most important aspects to get correct. If a sensitive creative is displayed on the wrong webpage, it can ruin the user's experience, the publisher's reputation, and may have legal implications. To protect against this, humans audit every creative before it is displayed through our ad exchange; this process is costly and time consuming. This paper explains how we automated sensitive category detection. To detect whether a creative has any sensitive content, we use a pre-trained deep convolutional neural network (Xception [1]) to process the creative image and merge this with the historical distribution of sensitive categories associated with the creative's landing page (the webpage that loads when the ad is clicked, which may also contain sensitive content). This representation is then passed into a series of fully connected layers to make a prediction of whether a creative belongs to a sensitive category. We show in offline testing that this model achieves slightly better than human performance (model accuracy 99.92%; human accuracy 99.88%) on a large fraction of creatives (61%) while making 3.5 times fewer mistakes in certain categories for which mistakes are especially costly. These results changed somewhat when deploying this model at scale in production, where a small modification resulted in classifying fewer creatives than estimated offline, with approximately the same accuracy (52% classified with 99.87% accuracy).
展开▼