首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion
【24h】

Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

机译:堆叠深神经网络和混合密度网络对声学对剖反的研究

获取原文

摘要

Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.
机译:基于声学信号的声学对剖反预测铰接运动对于谈话头,语音识别和教育等许多应用是有用的。基于DNN的技术在该地区实现了最先进的性能。本文调查了不同堆叠网络架构进行声学对关节逆变。可以使用不同类型的辅助特征来连接两级的DNN或混合密度网络(MDNS),包括通过从第一级网络提取的MLPG算法的瓶颈特征,直接产生的特征和预测的铰接特征。对于实验,在MNGU0英语EMA数据库和AIMSL中型EMA数据库中评估使用DNN,延时DNN(TDNNS),RNNS和MDNS的堆叠系统。最后,在使用LSF声学特征的MNGU0数据的默认配置中,使用具有椭圆形差异和MLPG生成的功能的提出的堆叠系统在RMSE中获得0.718mm,其类似于RNN和RNN-MDN BLSTM系统较慢更困难的训练阶段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号