首页> 外国专利> Text independent speaker-verification on a media operating system using deep learning on raw waveforms

Text independent speaker-verification on a media operating system using deep learning on raw waveforms

机译：使用原始波形的深度学习在媒体操作系统上进行文本独立的说话人验证

页面导航

摘要
著录项
相似文献

摘要

An artificial neural network architecture is provided for processing raw audio waveforms to create speaker representations that are used for text-independent speaker verification and recognition. The artificial neural network architecture includes a strided convolution layer, first and second sequentially connected residual blocks, a transformer layer, and a final fully connected (FC) layer. The strided convolution layer is configured to receive raw audio waveforms from a speaker. The first and the second residual blocks both include multiple convolutional and max pooling layers. The transformer layer is configured to aggregate frame level embeddings to an utterance level embedding. The output of the FC layer creates a speaker representation for the speaker whose raw audio waveforms were inputted into the strided convolution layer.

机译：提供了一种人工神经网络体系结构，用于处理原始音频波形，以创建用于与文本无关的说话者验证和识别的说话者表示。人工神经网络架构包括跨步卷积层，第一和第二顺序连接的残差块，转换器层以及最终的完全连接（FC）层。跨步卷积层被配置为从扬声器接收原始音频波形。第一和第二残余块都包括多个卷积层和最大池化层。变换器层被配置为将帧级嵌入聚合为话语级嵌入。 FC层的输出为扬声器的扬声器表示形式，其原始音频波形已输入到大步卷积层中。

著录项

公开/公告号US10699715B1

专利类型
公开/公告日2020-06-30

原文格式PDF
申请/专利权人 ALPHONSO INC.;
展开▼

申请/专利号US202016773427
发明设计人 AASHIQ MUHAMED;SUSMITA GHOSE;
展开▼

申请日2020-01-27
分类号G10L17/18;G06N3/04;G10L17/04;G10L17;
国家 US
入库时间 2022-08-21 11:30:20

相似文献

专利
外文文献
中文文献