Embodiments relate to providing a conference for client devices with spatialized audio. Input audio streams are received from the client devices. For each client device, placement data defining spatial locations of other client devices within a sound field is determined. A mixed stream including a left mixed channel and a right mixed channel for the client device is generated by mixing and panning input audio streams of the other client devices according to the placement data. A spatially enhanced stream including a left enhanced channel for a left speaker and a right enhanced channel for a right speaker is generated by applying subband spatial processing and crosstalk processing on the left mixed channel and the right mixed channel of the mixed stream.
展开▼