Articulatory Feature Extraction using CTC to build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition

机译：使用CTC构建无缝框架对校正语音识别的无缝框架校准的铰接特征提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory feature values at frame level. Manually aligning data at frame level is a tedious task and alignments obtained from the phone alignments using phone-to-articulatory feature mapping are prone to errors. In this paper, a technique using connectionist temporal classification (CTC) criterion to train an articulatory classifier using bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) is proposed. The CTC criterion eliminates the need for forced frame level alignments. Articulatory classifiers were also built using different neural network architectures like deep neural networks (DNN), convolutional neural network (CNN) and BLSTM with frame level alignments and were compared to the proposed approach of using CTC. Among the different architectures, articulatory features extracted using articulatory classifiers built with BLSTM gave better recognition performance. Further, the proposed approach of BLSTM with CTC gave the best overall performance on both SVitchboard (6 hours) and Switchboard 33 hours data set.

机译：通过纳入语音生产知识来提供对扬声器和环境变异的鲁棒性。伪明细特征是使用从语音数据训练的明晰度分类器提取清晰度特征的一种方式。建立清晰分类器面临的主要问题之一是在帧级别的铰接特征值方面对语音数据的要求。手动对齐帧级别的数据是繁琐的任务，并且使用电话与剖视特征映射从电话对齐获得的对齐易于错误。在本文中，提出了一种使用连接员时间分类（CTC）标准的技术，用于使用双向短期内存（BLSTM）经常性神经网络（RNN）训练铰接式分类器。 CTC标准消除了对强制帧级别对齐的需求。剖学分类器也使用不同的神经网络架构（如深神经网络（DNN），卷积神经网络（CNN）和BLSTM等不同的神经网络架构建造，并且与帧级别对齐相比，并且与使用CTC的所提出的方法进行比较。在不同的架构中，使用用BLSTM构建的铰接性分类器提取的明晰度特征给出了更好的识别性能。此外，具有CTC的BLSTM的建议方法在SVitchboard（6小时）和交换机33小时内具有最佳整体性能。

著录项

来源
《Annual Conference of the International Speech Communication Association》|2016年|p745-1531|共5页
会议地点
作者
Basil Abraham; S. Umesh; Neethu Mariam Joy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TB95-53;
关键词
入库时间 2022-08-21 11:41:05

相似文献

外文文献
中文文献
专利

1. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [J] . Ghosh P.K., Narayanan S. The Journal of the Acoustical Society of America . 2011,第4aPta1期

机译：使用从与主题无关的声音到发音反转的发音特征进行自动语音识别
2. Articulatory and excitation source features for speech recognition in read, extempore and conversation modes [J] . K. E. Manjunath, K. Sreenivasa Rao International journal of speech technology . 2016,第1期

机译：用于阅读，临时和对话模式下语音识别的发音和激励源功能
3. Articulatory feature based continuous speech recognition using probabilistic lexical modeling [J] . Ramya Rasipuram, Mathew Magimai.-Doss Computer speech and language . 2016,第Mara期

机译：基于发音特征的概率词汇建模的连续语音识别
4. Articulatory Feature Extraction using CTC to build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition [C] . Basil Abraham, S. Umesh, Neethu Mariam Joy Annual Conference of the International Speech Communication Association . 2016

机译：使用CTC提取铰接特征提取构建无缝分类器，而无需强制框架对齐进行语音识别
5. Modeling articulatory dynamics using HMM techniques for automatic speech recognition. [D] . Erler, Kevin J. 1994

机译：使用HMM技术对发音动力学进行建模以实现自动语音识别。
6. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [O] . Prasanta Kumar Ghosh, Shrikanth Narayanan -1

机译：使用从独立于受试者的声学到发音反转的发音特征进行自动语音识别
7. Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from X-ray data [O] . C. S. Blackburn, S. J. Young 1996

机译：用于从X射线数据自动提取特征的伪发音语音合成
8. Speech Recognition, Articulatory Feature Detection, and Speech Synthesis in Multiple Languages [R] . Ore, B. M. 2009

机译：语音识别，发音特征检测和多语言语音合成

Articulatory Feature Extraction using CTC to build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅