This paper presents a new method for retrieving images. A user's similarity interpretation is subjective and a similarity model's interpretation is objective. This method combines textual and object-based visual features to decrease this difference. It uses a novel multi-scale segmentation framework to detect prominent objects in an image. These objects are grouped depending on their visual features and mapped to related words obtained from psychophysical studies. Then, a hierarchy of words expressing higher-level meaning is determined. This is based on natural language processing and user evaluation. Experiments were carried out on a large set of natural images. These showed higher retrieval precision in terms of estimating user retrieval semantics obtained via this two-layer word association. Various query specifications and options were also supported.
展开▼