Mandarin speech recognition using convolution neural network with augmented tone features

机译：卷积神经网络增强语音特征的普通话语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to its ability of reducing spectral variations and modeling spectral correlations existed in speech signals, the convolutional neural network (CNN) has been shown effective in modeling speech compared to deep neural network (DNN). In this study, we explore applying CNN to Mandarin speech recognitions. Besides exploring appropriate CNN architecture for recognition performance, focuses are on investigating the effective acoustic features, and effectivenesses of applying tonal information which have been verified helpful in other types of acoustic models to the acoustic features in the CNN. We conduct speech recognition experiments on Mandarin broadcast speech recognition to test the effectivenesses of the proposed approaches. The CNN shows its clear superiority to the DNN, with relative reductions of character error rate (CER) among 7.7–13.1% for broadcast news speech (BN), and 5.4–9.9% for broadcast conversation speech (BC). Like in the Gaussian Mixture Model (GMM) and DNN systems, the tonal information characterized by the fundamental frequency (F) and fundamental frequency variations (FFV) are found still helpful in CNN models, they achieve relative CER reductions over 6.7% for BN and 4.3% for BC respectively when compared with the baseline Mel-filter bank feature.

机译：由于卷积神经网络（CNN）可以减少语音信号中的频谱变化并对频谱相关性进行建模，因此与深层神经网络（DNN）相比，已显示出对语音建模的有效效果。在这项研究中，我们探索将CNN应用于普通话语音识别。除了探索适当的CNN架构以提高识别性能外，重点还在于调查有效的声学特征以及将已证明在其他类型的声学模型中有用的音调信息应用于CNN的声学特征的有效性。我们对普通话广播语音识别进行语音识别实验，以测试所提出方法的有效性。 CNN表现出明显优于DNN的优势，广播新闻语音（BN）的字符错误率（CER）相对降低了7.7–13.1％，广播对话语音（BC）的字符错误率（CER）降低了5.4–9.9％。像在高斯混合模型（GMM）和DNN系统中一样，以基频（F）和基频变化（FFV）为特征的音调信息在CNN模型中仍然很有帮助，对于BN和CN而言，它们的CER相对降低了6.7％以上。与基线Mel-filter bank功能相比，BC分别为4.3％。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2014年|15-18|共4页
会议地点
作者
Hu Xinhui; Lu Xugang; Hori Chiori;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Decision support systems; Radio frequency; Rail to rail inputs; CNN; Finf0/inf; FFV; Mandarin speech recognition; tonal feature;

机译：决策支持系统;射频;轨到轨输入; CNN; F 0 ; FFV;普通话语音识别;音调特征;

相似文献

外文文献
中文文献
专利

1. Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network [J] . Xiao-Dong WANG, Keikichi HIROSE, Jin-Song ZHANG, IEICE Transactions on Information and Systems . 2008,第6期

机译：基于音频核模型和神经网络的普通话连续语音识别
2. Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network [J] . Xiao-Dong Wang, Keikichi Hirose, Jin-Song Zhang, 電子情報通信学会技術研究報告. 音声. Speech . 2006,第443期

机译：基于音频核模型和神经网络的普通话连续语音识别
3. Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network [J] . Xiao-Dong Wang, Keikichi Hirose, Jin-Song Zhang, 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2006,第441期

机译：基于音频核模型和神经网络的普通话连续语音识别
4. Mandarin speech recognition using convolution neural network with augmented tone features [C] . Hu Xinhui, Lu Xugang, Hori Chiori International Symposium on Chinese Spoken Language Processing . 2014

机译：普通话语音识别使用增强音调的卷积神经网络
5. Applications of convolutional neural networks to facial detection and recognition for augmented reality and wearable computing. [D] . Mitchell, Christopher. 2010

机译：卷积神经网络在增强现实和可穿戴计算的面部检测和识别中的应用。
6. The Binaural Masking-Level Difference of Mandarin Tone Detection and the Binaural Intelligibility-Level Difference of Mandarin Tone Recognition in the Presence of Speech-Spectrum Noise [O] . Cheng-Yu Ho, Pei-Chun Li, Yuan-Chuan Chiang, -1

机译：语音频谱噪声下普通话检测的双耳掩蔽水平差异和普通话识别的双耳可懂度水平差异
7. Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech [O] . Neumann, Michael, Vu, Ngoc Thang 2017

机译：基于卷积神经网络的语音情感识别：输入特征，信号长度和作用语音的影响研究
8. Speech Recognition Using Kohonen Neural Networks, Dynamic Programming and Multi-Feature Fusion. [R] . Stowe, F. S. 1990

机译：使用Kohonen神经网络，动态规划和多特征融合的语音识别。

Mandarin speech recognition using convolution neural network with augmented tone features

摘要

著录项

相似文献

相关主题

期刊订阅