A method of generating a user recognition structure of an audible sound according to user actions in a VE is provided. A plurality of different predetermined digital audio tracks of sound are provided using the processor. The acoustic digital audio tracks are determined such that a predetermined structure of the acoustic is recognized by the user according to a combination of the acoustic digital audio tracks. Using the processor, the spatial position and size of the audio regions in the VE are determined such that at least two audio regions overlap at least partially. Using the processor, each audio region is associated with one of the digital audio tracks and the volume level distribution of each digital audio track within each audio region is determined. The VE is visually displayed to the user using a display connected to the processor. Using at least one of a user input device and a user behavior sensor connected to the processor, user information indicating user behavior in the VE is received. Using the processor, a spatial position in the VE according to the user information is determined. Using the processor, the volume level of the digital audio track of each audio region corresponding to the spatial position is determined according to the spatial position and the volume level distribution of the digital audio track within each audio region. Using the processor, digital audio tracks of audio regions that match the spatial position are combined according to the determined volume level of the digital audio track and an acoustic signal representing this is provided. Using the speaker connected to the processor, an audible sound according to the sound signal is generated and provided to the user.
展开▼