首页> 外国专利> Convolutional neural network with phonetic attention for speaker verification

Convolutional neural network with phonetic attention for speaker verification

机译：基于语音注意的卷积神经网络说话人识别

页面导航

摘要
著录项
相似文献

摘要

Embodiments may include reception of a plurality of speech frames, determination of a multi-dimensional acoustic feature associated with each of the plurality of speech frames, determination of a plurality of multi-dimensional phonetic features, each of the plurality of multi-dimensional phonetic features determined based on a respective one of the plurality of speech frames, generation of a plurality of two-dimensional feature maps based on the phonetic features, input of the feature maps and the plurality of acoustic features to a convolutional neural network, the convolutional neural network to generate a plurality of speaker embeddings based on the plurality of feature maps and the plurality of acoustic features, aggregation of the plurality of speaker embeddings into a first speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, and determination of a speaker associated with the plurality of speech frames based on the first speaker embedding.

机译：实施例可包括接收多个语音帧、确定与多个语音帧中的每一个相关联的多维声学特征、确定多个多维语音特征、基于多个语音帧中的相应一个确定多个多维语音特征中的每一个，基于语音特征生成多个二维特征映射，将特征映射和多个声学特征输入到卷积神经网络，卷积神经网络基于多个特征映射和多个声学特征生成多个说话人嵌入，基于为所述多个说话人嵌入中的每一个确定的各自权重，将所述多个说话人嵌入聚合到第一说话人嵌入中，并基于所述第一说话人嵌入确定与所述多个语音帧相关联的说话人。

著录项

公开/公告号US11276410B2

专利类型
公开/公告日2022-03-15

原文格式PDF
申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;
展开▼

申请/专利号US201916682921
发明设计人 YONG ZHAO;TIANYAN ZHOU;JINYU LI;YIFAN GONG;JIAN WU;ZHUO CHEN;
展开▼

申请日2019-11-13
分类号G10L17/14;G10L17/18;G06N3/08;G10L17/02;
国家 US
入库时间 2024-06-14 22:48:15

相似文献

专利
外文文献
中文文献