首页> 外国专利> AUDIO-VISUAL SPEECH SEPARATION

AUDIO-VISUAL SPEECH SEPARATION

机译：视听语音分离

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

机译：方法，系统和设备，包括在计算机存储介质上编码的计算机程序，用于视听语音分离。一种方法包括：从已经检测到一个或多个扬声器的面部的视频中的帧流中的每个帧，每个扬声器的面部嵌入相应的每个帧的相应的每个帧面部;为每个扬声器处理，每个扬声器的每个框架面部嵌入扬声器的面部以产生扬声器面部的视觉特征;获取视频的音频声音的频谱图;处理频谱图以生成音频嵌入音频声音的音频;结合一个或多个扬声器的可视特性和音频嵌入的音频声音嵌入，以生成视频的视听嵌入;确定每个扬声器中的每一个的相应频谱图掩模;并确定每个扬声器的相应分离的语音谱图。

著录项

公开/公告号EP3607547B1

专利类型
公开/公告日2021-06-16

原文格式PDF
申请/专利权人
展开▼

申请/专利号EP20180815918
发明设计人 MOSSERI INBAR;RUBINSTEIN MICHAEL;EPHRAT ARIEL;FREEMAN WILLIAM;LANG ORAN;WILSON KEVIN WILLIAM;DEKEL TALI;HASSIDIM AVINATAN;
展开▼

申请日2018-11-21
分类号G10L17/18;
国家 EP
入库时间 2022-08-24 19:22:32

相似文献

专利
外文文献
中文文献