A system and method for audio-visual multi-speaker speech separation, including: receiving audio signals captured by at least one microphone; receiving video signals captured by at least one camera; and applying audio-visual separation on the received audio signals and video signals to provide isolation of sounds from individual sources, wherein the audio-visual separation is based, in part, on angle positions of at least one speaker relative to the at least one camera.
展开▼