首页> 外文会议>Annual Conference of the International Speech Communication Association >Stacked long-term TDNN for Spoken Language Recognition
【24h】

Stacked long-term TDNN for Spoken Language Recognition

机译:堆积长期TDNN用于口语识别

获取原文

摘要

This paper introduces a stacked architecture that uses a time delay neural network (TDNN) to model long-term patterns for spoken language identification. The first component of the architecture is a feed-forward neural network with a bottleneck layer that is trained to classify context-dependent phone states (senones). The second component is a TDNN that takes the output of the bottleneck, concatenated over a long time span, and produces a posterior probability over the set of languages. The use of a TDNN architecture provides an efficient model to capture discriminative patterns over a wide temporal context. Experimental results are presented using the audio data from the language i-vector challenge (IVC) recently organized by NIST. The proposed system outperforms a state-of-the-art shifted delta cepstra i-vector system and provides complementary information to fuse with the new generation of bottleneck-based i-vector systems that model short-term dependencies.
机译:本文介绍了一种堆叠的架构,它使用时间延迟神经网络(TDNN)来模拟用于口语识别的长期模式。 该架构的第一个组件是前馈神经网络,其具有培训的瓶颈层以对依赖上下文的电话状态(Senones)进行分类。 第二组件是TDNN,其采用瓶颈的输出,在很长的时间范围内连接,并在该组语言中产生后验概率。 使用TDNN架构提供了一种有效的模型,可以在宽的时间上下文上捕获判别模式。 使用NIST最近组织的语言I形载挑战(IVC)的音频数据提出了实验结果。 所提出的系统优于最先进的移位的Delta Cepstra i-vector系统,并提供互补信息,以便与模型短期依赖性的新一代的基于瓶颈的I形载体系统保险丝。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号