首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion
【24h】

Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

机译:堆叠深层神经网络和混合密度网络用于声-发音反演的研究

获取原文
获取外文期刊封面目录资料

摘要

Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.
机译:基于声信号的发音到发音的反转预测发音运动对于许多应用(例如会说话的头,语音识别和教育)很有用。基于DNN的技术已在该地区取得了最先进的性能。本文研究了用于声学到发音反演的不同堆叠网络体系结构。可以使用不同类型的辅助功能(包括瓶颈功能,直接生成的功能以及通过从第一级网络提取的MLPG算法预测的发音功能)连接两个级别的DNN或混合密度网络(MDN)。对于实验,在MNGU0英文EMA数据库和AIMSL中文EMA数据库上对使用DNN,延时DNN(TDNN),RNN和MDN的堆叠系统进行了评估。最后,在使用LSF声学特征的MNGU0数据的默认配置上,所提出的使用具有椭圆体变化和MLPG生成特征的前馈MDN的堆叠系统的RMSE值为0.718mm,这与RNN和RNN-MDN BLSTM系统的速度较慢而且训练阶段比较困难。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号