There is a multimedia device 4, a multimedia environment 2 and a method for controlling the multimedia device 4. The multimedia environment 2 further comprises a sensor 6 for acquisition of audio and/or video information. The multimedia device 4 is configured to perform gesture and or speech recognition based on the acquired audio and/or video information. A wake-up event that is assigned to activation of the multimedia device 4 and is initiated by a user 18 of the multimedia environment 2 is detected based on acquired audio and/or video information. The multimedia device 4 is set to an active state upon the detection of the wake-up event. Further, a position 24 of the user 22 in the multimedia environment 2 is determined. A viewing direction 26 of the user 22 may further be detected. The filtered gesture recognition is performed based on subsequently acquired video information wherein the step of filtering takes into account the determined position 24 and the determined viewing direction 26 of the user 22.
展开▼