A variation of scale-invariant feature transform (SIFT) based on pooling gradient orientations across different domain sizes, in addition to spatial locations. The resulting descriptor is called DSP-SIFT, and it outperforms other methods in wide-baseline matching benchmarks, including those based on convolutional neural networks, despite having the same dimension of SIFT and requiring no training. Problems of local representation of imaging data are also addressed as computation of minimal sufficient statistics that are invariant to nuisance variability induced by viewpoint and illumination. A sampling-based and a point-estimate based approximation of such representations are described.
展开▼