This paper deals with the problem of semantic transcoding of CCTV video footage. A framework is proposed that combines Computer Vision algorithms that extract visual semantics, together with Natural Language Processing that automatically builds the domain ontology from unstructured text annotations. The final aim is a system that will link the visual and text semantics in order to routinely annotate video sequences with the appropriate keywords of the domain experts'' terminology.
展开▼