首页> 外国专利> TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

机译：在频谱图制作条件下通过扬声器进行有针对性的语音分离

页面导航

摘要
著录项
相似文献

摘要

Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

机译：公开了使得能够处理音频数据以生成音频数据的一个或多个精炼版本的技术，其中音频数据的每个精炼版本隔离单个相应人类讲话者的一个或多个话语。各种实施方式生成音频数据的精简版本，该音频版本通过使用通过处理频谱的频谱图而生成的掩码来处理音频数据的频谱图表示（通过频率变换处理音频数据而生成），从而隔离单个人类说话者的发音。音频数据，并使用经过训练的语音过滤器模型为单个人类说话者嵌入说话者。使用频率变换的逆处理经过训练的语音滤波器模型生成的输出，以生成精炼的音频数据。

著录项

公开/公告号US2020202869A1

专利类型
公开/公告日2020-06-25

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号US201916598172
发明设计人 QUAN WANG;PRASHANT SRIDHAR;IGNACIO LOPEZ MORENO;HANNAH MUCKENHIRN;
展开▼

申请日2019-10-10
分类号G10L17/22;G10L25/18;G10L17/04;G10L17/18;G10L17;G10L17/02;
国家 US
入库时间 2022-08-21 11:23:36

相似文献

专利
外文文献
中文文献