首页> 外国专利> TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

机译:在频谱图制作条件下通过扬声器进行有针对性的语音分离

摘要

Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.
机译:公开了使得能够处理音频数据以生成音频数据的一个或多个精炼版本的技术,其中音频数据的每个精炼版本隔离单个相应人类讲话者的一个或多个话语。各种实施方式生成音频数据的精简版本,该音频版本通过使用通过处理频谱的频谱图而生成的掩码来处理音频数据的频谱图表示(通过频率变换处理音频数据而生成),从而隔离单个人类说话者的发音。音频数据,并使用经过训练的语音过滤器模型为单个人类说话者嵌入说话者。使用频率变换的逆处理经过训练的语音滤波器模型生成的输出,以生成精炼的音频数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号