首页> 外国专利> End-To-End Multi-Talker Overlapping Speech Recognition

End-To-End Multi-Talker Overlapping Speech Recognition

机译：端到端多讲话者重叠语音识别

页面导航

摘要
著录项
相似文献

摘要

A method for training a speech recognition model with a loss function includes receiving an audio signal including a first segment corresponding to audio spoken by a first speaker, a second segment corresponding to audio spoken by a second speaker, and an overlapping region where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding for each of the first and second speakers. The method also includes applying a masking loss after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.

机译：一种用于训练具有损耗函数的语音识别模型的方法包括接收包括由第一扬声器的音频口头的第一段的音频信号，对应于由第二扬声器进行的音频进行音频的第二段，以及第一段的重叠区域重叠第二段。重叠区域包括已知的开始时间和已知的结束时间。该方法还包括为第一和第二扬声器中的每一个生成相应的屏蔽音频嵌入。该方法还包括当在已知开始时间之前讲述第一扬声器时，在第一扬声器上讲述第一扬声器的相应掩码音频嵌入的相应掩码音频嵌入的掩模丢失，或者在第一个扬声器之前应用屏蔽丢失发言者在已知的结束时间后发言。

著录项

公开/公告号US2021343273A1

专利类型
公开/公告日2021-11-04

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号US202016865075
发明设计人 ANSHUMAN TRIPATHI;HAN LU;HASIM SAK;
展开▼

申请日2020-05-01
分类号G10L15/06;G10L15/16;G10L15/04;G06N3/08;G06N20;
国家 US
入库时间 2022-08-24 22:04:41

相似文献

专利
外文文献
中文文献