A device comprising an audio decoder, a memory, and a processor may be configured to perform various aspects of the techniques. The audio decoder may decode, from a bitstream, first audio data and second audio data. The memory may store the first audio data and the second audio data. The processor may render the first audio data into first spatial domain audio data for playback by virtual speakers at a set of virtual speaker locations, and render the second audio data into second spatial domain audio data for playback by the virtual speakers at the set of virtual speaker locations. The processor may also mix the first spatial domain audio data and the second spatial domain audio data to obtain mixed spatial domain audio data, and convert the mixed spatial domain audio data to scene-based audio data.
展开▼