Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Ju Lin; Wei Li; Yingming Gao; Yanlu Xie; Nancy F. Chen; Sabato Marco Siniscalchi; Jinsong Zhang; Chin-Hui Lee

首页> 外文期刊>Journal of VLSI signal processing systems >Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

【24h】

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

机译：通过扩展识别网络将声学和发音特征相结合来改善基于DNN的普通话音调识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we investigate the effectiveness of articulatory information for Mandarin tone modeling and recognition in a deep neural network – hidden Markov model (DNN-HMM) framework. In conventional approaches, prosodic evidence (e.g., F0, duration and energy) is used to build tone classifiers, we here propose performance enhancement techniques in three areas: (i) adding articulatory features (AFs) and acoustic features, such as MFCCs (Mel frequency cepstrum coefficients), for tone modeling; (ii) adopting phone-dependent tone modeling; and (iii) using tone-based extended recognition network (ERN) to reduce the tone search space. The first approach is feature-related, it explicitly employs the AFs as a form of tonal features and is implemented through a multi-stage procedure. The second approach is model-related and directly extends to phone-dependent tone modeling so that each modeling unit (e.g., tonal phone) not only contains tone information, but also integrates the phone/articulatory information. Finally, the third technique is search-related with a phone-dependent tone-based expanding searching network. A series of comprehensive experiments is conducted using different input feature sets. It is demonstrated that (i) tone recognition accuracy is boosted by incorporating articulatory information, and (ii) ERN, attains the lowest tone error rate of 7.17%, with a 56% relative error reduction from the prosody-only baseline system error of 16.36%.

机译：在本文中，我们研究了在深层神经网络-隐马尔可夫模型（DNN-HMM）框架中，语音信息对于普通话音调建模和识别的有效性。在常规方法中，使用韵律证据（例如F0，持续时间和能量）来构建音调分类器，我们在此提出三个方面的性能增强技术：（i）添加发音特征（AF）和声学特征，例如MFCC（Mel频率倒谱系数），用于音调建模；（ii）采用与电话有关的音调建模；（iii）使用基于音调的扩展识别网络（ERN）来减少音调搜索空间。第一种方法是与特征有关的，它明确地将自动对焦作为色调特征的一种形式，并通过多阶段程序来实现。第二种方法是与模型有关的，并且直接扩展到与电话有关的音调建模，从而每个建模单元（例如，音调电话）不仅包含音调信息，而且还集成了电话/发音信息。最后，第三种技术与与电话相关的基于音调的扩展搜索网络相关。使用不同的输入功能集进行了一系列综合实验。结果表明：（i）通过结合发音信息可以提高音调识别的准确性；（ii）ERN达到最低的音调错误率7.17％，相对于仅基于韵律的基线系统错误16.36降低了56％的相对错误率％。

著录项

来源
《Journal of VLSI signal processing systems》 |2018年第7期|1077-1087|共11页
作者
Ju Lin; Wei Li; Yingming Gao; Yanlu Xie; Nancy F. Chen; Sabato Marco Siniscalchi; Jinsong Zhang; Chin-Hui Lee;
展开▼
作者单位

College of Information Sciences, Beijing Language and Culture University;

School of Electrical and Computer Engineering, Georgia Institute of Technology;

College of Information Sciences, Beijing Language and Culture University;

College of Information Sciences, Beijing Language and Culture University;

Institute for Infocomm Research;

School of Electrical and Computer Engineering, Georgia Institute of Technology,Department of Telematics, Kore University of Enna;

College of Information Sciences, Beijing Language and Culture University;

School of Electrical and Computer Engineering, Georgia Institute of Technology;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Articulatory features; MFCC; Posterior probabilities; Deep neural network; Mandarin tone recognition; Tone-based extended recognition network;

机译：发音特征;MFCC;后验概率;深层神经网络;普通话语调识别;基于音调的扩展识别网络;

相似文献

外文文献
中文文献
专利

1. Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network [J] . Xiao-Dong WANG, Keikichi HIROSE, Jin-Song ZHANG, IEICE Transactions on Information and Systems . 2008,第6期

机译：基于音频核模型和神经网络的普通话连续语音识别
2. Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network [J] . Xiao-Dong Wang, Keikichi Hirose, Jin-Song Zhang, 電子情報通信学会技術研究報告. 音声. Speech . 2006,第443期

机译：基于音频核模型和神经网络的普通话连续语音识别
3. Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network [J] . Xiao-Dong Wang, Keikichi Hirose, Jin-Song Zhang, 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2006,第441期

机译：基于音频核模型和神经网络的普通话连续语音识别
4. Improving Mandarin tone recognition based on DNN by combining acoustic and articulatory features [C] . Ju Lin, Yanlu Xie, Yingming Gao, International Symposium on Chinese Spoken Language Processing . 2016

机译：通过结合声学和发音特征来改善基于DNN的普通话音调识别
5. The role of lexical tone in L2 Mandarin spoken word recognition [D] . Sun, Kuo-Chan 2012

机译：词汇语调在二语普通话识别中的作用
6. An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition [O] . Alicia Lozano-Diez, Ruben Zazo, Doroteo T. Toledano, -1

机译：深度神经网络（DNN）拓扑对基于瓶颈特征的语言识别的影响分析
7. Integrating Articulatory Features using Kullback-Leibler Divergence based Acoustic Model for Phoneme Recognition [O] . Ramya Rasipuram, Mathew Magimai. -doss 2011

机译：使用基于Kullback-Leibler发散的声学模型对音素识别进行整合发音特征

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

摘要

著录项

相似文献

相关主题

期刊订阅