We propose and evaluate several triplet CNN architectures for measuring thesimilarity between sketches and photographs, within the context of the sketchbased image retrieval (SBIR) task. In contrast to recent fine-grained SBIRwork, we study the ability of our networks to generalise across diverse objectcategories from limited training data, and explore in detail strategies forweight sharing, pre-processing, data augmentation and dimensionality reduction.We exceed the performance of pre-existing techniques on both the Flickr15kcategory level SBIR benchmark by $18%$, and the TU-Berlin SBIR benchmark by$sim10 mathcal{T}_b$, when trained on the 250 category TU-Berlinclassification dataset augmented with 25k corresponding photographs harvestedfrom the Internet.
展开▼