首页> 外国专利> AUDIO-VISUAL SPEECH SEPARATION

AUDIO-VISUAL SPEECH SEPARATION

机译:视听语音分离

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.
机译:方法,系统和设备,包括在计算机存储介质上编码的计算机程序,用于视听语音分离。一种方法包括:从已经检测到一个或多个扬声器的面部的视频中的帧流中的每个帧,每个扬声器的面部嵌入相应的每个帧的相应的每个帧面部;为每个扬声器处理,每个扬声器的每个框架面部嵌入扬声器的面部以产生扬声器面部的视觉特征;获取视频的音频声音的频谱图;处理频谱图以生成音频嵌入音频声音的音频;结合一个或多个扬声器的可视特性和音频嵌入的音频声音嵌入,以生成视频的视听嵌入;确定每个扬声器中的每一个的相应频谱图掩模;并确定每个扬声器的相应分离的语音谱图。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号