Embodiments may provide techniques that provide identification of images that can provide reduced resource utilization due to reduced sampling of video frames for visual recognition. For example, in an embodiment, a method of visual recognition processing may be implemented in a computer system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, the method comprising: coarsely segmenting video frames of video stream into a plurality of clusters based on scenes of the video stream, sampling a plurality of video frames from each cluster; determining a quality of each cluster, re-clustering the video frames of video stream to improve the quality of at least some of the clusters.
展开▼