Vehicle localization is an important task in the signal processing field. In recent years, context exploration has been widely studied, especially the nonlocal dependencies in an image, using, for example, attention and transformer mechanisms. However, these approaches encounter difficulties in achieving accurate localization owing to ineffective design and use of queries. Motivated by the fact that spatial information is determined by decoder embeddings and details of reference boxes, we propose a method of explicitly and dynamically modeling anchor boxes in the query generation module. Moreover, we design a geometry-aware data augmentation approach to increase the diversity of the data by employing multiple augmentation methods on an image. Experiments conducted on public datasets show that our approach can improve the average precision by approximately 1.1.
展开▼