We propose an attentive local feature descriptor suitable for large-scaleimage retrieval, referred to as DELF (DEep Local Feature). The new feature isbased on convolutional neural networks, which are trained only with image-levelannotations on a landmark image dataset. To identify semantically useful localfeatures for image retrieval, we also propose an attention mechanism forkeypoint selection, which shares most network layers with the descriptor. Thisframework can be used for image retrieval as a drop-in replacement for otherkeypoint detectors and descriptors, enabling more accurate feature matching andgeometric verification. Our system produces reliable confidence scores toreject false positives---in particular, it is robust against queries that haveno correct match in the database. To evaluate the proposed descriptor, weintroduce a new large-scale dataset, referred to as Google-Landmarks dataset,which involves challenges in both database and query such as backgroundclutter, partial occlusion, multiple landmarks, objects in variable scales,etc. We show that DELF outperforms the state-of-the-art global and localdescriptors in the large-scale setting by significant margins.
展开▼