...
首页> 外文期刊>Journal of VLSI signal processing systems >Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks
【24h】

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

机译:通过扩展识别网络将声学和发音特征相结合来改善基于DNN的普通话音调识别

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we investigate the effectiveness of articulatory information for Mandarin tone modeling and recognition in a deep neural network – hidden Markov model (DNN-HMM) framework. In conventional approaches, prosodic evidence (e.g., F0, duration and energy) is used to build tone classifiers, we here propose performance enhancement techniques in three areas: (i) adding articulatory features (AFs) and acoustic features, such as MFCCs (Mel frequency cepstrum coefficients), for tone modeling; (ii) adopting phone-dependent tone modeling; and (iii) using tone-based extended recognition network (ERN) to reduce the tone search space. The first approach is feature-related, it explicitly employs the AFs as a form of tonal features and is implemented through a multi-stage procedure. The second approach is model-related and directly extends to phone-dependent tone modeling so that each modeling unit (e.g., tonal phone) not only contains tone information, but also integrates the phone/articulatory information. Finally, the third technique is search-related with a phone-dependent tone-based expanding searching network. A series of comprehensive experiments is conducted using different input feature sets. It is demonstrated that (i) tone recognition accuracy is boosted by incorporating articulatory information, and (ii) ERN, attains the lowest tone error rate of 7.17%, with a 56% relative error reduction from the prosody-only baseline system error of 16.36%.
机译:在本文中,我们研究了在深层神经网络-隐马尔可夫模型(DNN-HMM)框架中,语音信息对于普通话音调建模和识别的有效性。在常规方法中,使用韵律证据(例如F0,持续时间和能量)来构建音调分类器,我们在此提出三个方面的性能增强技术:(i)添加发音特征(AF)和声学特征,例如MFCC(Mel频率倒谱系数),用于音调建模; (ii)采用与电话有关的音调建模; (iii)使用基于音调的扩展识别网络(ERN)来减少音调搜索空间。第一种方法是与特征有关的,它明确地将自动对焦作为色调特征的一种形式,并通过多阶段程序来实现。第二种方法是与模型有关的,并且直接扩展到与电话有关的音调建模,从而每个建模单元(例如,音调电话)不仅包含音调信息,而且还集成了电话/发音信息。最后,第三种技术与与电话相关的基于音调的扩展搜索网络相关。使用不同的输入功能集进行了一系列综合实验。结果表明:(i)通过结合发音信息可以提高音调识别的准确性;(ii)ERN达到最低的音调错误率7.17%,相对于仅基于韵律的基线系统错误16.36降低了56%的相对错误率%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号