首页> 外文会议>Annual Conference of the International Speech Communication Association >Articulatory Feature Extraction using CTC to build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition
【24h】

Articulatory Feature Extraction using CTC to build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition

机译:使用CTC构建无缝框架对校正语音识别的无缝框架校准的铰接特征提取

获取原文

摘要

Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory feature values at frame level. Manually aligning data at frame level is a tedious task and alignments obtained from the phone alignments using phone-to-articulatory feature mapping are prone to errors. In this paper, a technique using connectionist temporal classification (CTC) criterion to train an articulatory classifier using bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) is proposed. The CTC criterion eliminates the need for forced frame level alignments. Articulatory classifiers were also built using different neural network architectures like deep neural networks (DNN), convolutional neural network (CNN) and BLSTM with frame level alignments and were compared to the proposed approach of using CTC. Among the different architectures, articulatory features extracted using articulatory classifiers built with BLSTM gave better recognition performance. Further, the proposed approach of BLSTM with CTC gave the best overall performance on both SVitchboard (6 hours) and Switchboard 33 hours data set.
机译:通过纳入语音生产知识来提供对扬声器和环境变异的鲁棒性。伪明细特征是使用从语音数据训练的明晰度分类器提取清晰度特征的一种方式。建立清晰分类器面临的主要问题之一是在帧级别的铰接特征值方面对语音数据的要求。手动对齐帧级别的数据是繁琐的任务,并且使用电话与剖视特征映射从电话对齐获得的对齐易于错误。在本文中,提出了一种使用连接员时间分类(CTC)标准的技术,用于使用双向短期内存(BLSTM)经常性神经网络(RNN)训练铰接式分类器。 CTC标准消除了对强制帧级别对齐的需求。剖学分类器也使用不同的神经网络架构(如深神经网络(DNN),卷积神经网络(CNN)和BLSTM等不同的神经网络架构建造,并且与帧级别对齐相比,并且与使用CTC的所提出的方法进行比较。在不同的架构中,使用用BLSTM构建的铰接性分类器提取的明晰度特征给出了更好的识别性能。此外,具有CTC的BLSTM的建议方法在SVitchboard(6小时)和交换机33小时内具有最佳整体性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号