In this thesis, a new type of representation for medium level vision operations is explored. We focus on representations that are sparse and monopolar. The word sparse signifies that information in the feature sets is not necessarily present at all points. On the contrary, most features are inactive. The word monopolar signifies that all features have the same sign, e.g., are either positive or zero. A zero feature value denotes "no information," and for nonzero values, the magnitude signifies the relevance. A sparse scale-space representation of local image structure (lines and edges) is developed. A method known as the channel representation is used to generate sparse representations, and its ability to deal with multiple hypotheses is described. It is also shown how these hypotheses can be extracted in a robust manner. The connection of soft histograms (i.e., histograms with overlapping bins) to the channel representation, as well as to the use of dithering in relaxation of quantization errors is shown. The use of soft histograms for estimation of unknown probability density functions (PDF) and estimation of image rotation are demonstrated. The advantage of using sparse, monopolar representations in associative learning is demonstrated. Finally, we show how sparse monopolar representations can be used to speed up and improve template matching.
展开▼