How to detect meaningful video representation becomes an interesting problem in various research communities. Visual attention system detects "Region of Interesting" from input video sequence. Generally the attended regions correspond to visually prominent object in the image in video sequence. In this paper, we have improved previous approaches using spatiotemporal attention modules. We proposed to make use of 3D depth map information in addition to spatiotemporal features. Therefore, the proposed method can compensate typical spatiotemporal saliency approaches for their inaccuracy. Motion is important cue when we derive temporal saliency. On the other hand noise information that deteriorates accuracy of temporal saliency is also obtained during the computation. To obtain the saliency map with more accuracy the noise should be removed. In order to settle down the problem, we used the result of psychological studies on "double opponent receptive field" and "noise filtration" in Middle Temporal area. We also applied "FlagMap" on each frame to prevent "Flickering" of global-area noise. As a result of this consideration, our system can detect the salient regions in the image with higher accuracy while removing noise effectively. It has been applied to several image sequences as a result the proposed method can describe the salient regions with more accuracy in another higher domain than the typical approach does. The obtained result can be applied to generate a spontaneous viewpoint offered by the system itself for "3-D imaging projector" or 3-DTV.
展开▼