The traditional information annotation technology is usually based on text description methods. With the ever-increasing multimedia resources and the rapid progress of cross-media technology, the annotation between different modalities of media information is becoming possible. This paper presents a study of a cross-media annotation technology. In this technology, some multimedia examples expressed by structured information segments are submitted to a cross-media meta-search engine, to which the results in return, as the examples of other multimedia, are submitted again. The correlation between those examples and results is calculated per-time so that a cross-media semantic network is constructed by the iterative process. Thus the content of a Web page could be explained by different audio-visual perceptive cross-media information using the semantic network. This technology is proved to be feasible and effectual in the archetype system CMA (Cross Media Annotation).
展开▼