首页>
外国专利>
Text independent speaker-verification on a media operating system using deep learning on raw waveforms
Text independent speaker-verification on a media operating system using deep learning on raw waveforms
展开▼
机译:使用原始波形的深度学习在媒体操作系统上进行文本独立的说话人验证
展开▼
页面导航
摘要
著录项
相似文献
摘要
An artificial neural network architecture is provided for processing raw audio waveforms to create speaker representations that are used for text-independent speaker verification and recognition. The artificial neural network architecture includes a strided convolution layer, first and second sequentially connected residual blocks, a transformer layer, and a final fully connected (FC) layer. The strided convolution layer is configured to receive raw audio waveforms from a speaker. The first and the second residual blocks both include multiple convolutional and max pooling layers. The transformer layer is configured to aggregate frame level embeddings to an utterance level embedding. The output of the FC layer creates a speaker representation for the speaker whose raw audio waveforms were inputted into the strided convolution layer.
展开▼