The method includes steps of: a) for each point of interest of each image, calculating a local gradient descriptor and a local motion descriptor; b) constitution of microstructures of n points of interest, each defined by a tuple of order order n ≥ 1; c) determining, for each tuple of a vector of structured visual characteristics (d0 ... d3 ...) from the local descriptors; d) for each tuple, map-page of this vector by a classification algorithm selecting a unique codeword from a set of codewords forming code-book (CB); e) generating an ordered time series of codewords (a0 ... a3 ...) for successive images of the video sequence; and f) measuring, by means of a string kernel function, the similarity of the time series of codewords with another time series of code-words from another speaker.
展开▼