Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

机译：堆叠深层神经网络和混合密度网络用于声-发音反演的研究

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.

机译：基于声信号的发音到发音的反转预测发音运动对于许多应用（例如会说话的头，语音识别和教育）很有用。基于DNN的技术已在该地区取得了最先进的性能。本文研究了用于声学到发音反演的不同堆叠网络体系结构。可以使用不同类型的辅助功能（包括瓶颈功能，直接生成的功能以及通过从第一级网络提取的MLPG算法预测的发音功能）连接两个级别的DNN或混合密度网络（MDN）。对于实验，在MNGU0英文EMA数据库和AIMSL中文EMA数据库上对使用DNN，延时DNN（TDNN），RNN和MDN的堆叠系统进行了评估。最后，在使用LSF声学特征的MNGU0数据的默认配置上，所提出的使用具有椭圆体变化和MLPG生成特征的前馈MDN的堆叠系统的RMSE值为0.718mm，这与RNN和RNN-MDN BLSTM系统的速度较慢而且训练阶段比较困难。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2018年|36-40|共5页
会议地点
作者
Xurong Xie; Xunying Liu; Tan Lee; Lan Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Acoustics; Feature extraction; Training; Databases; Neural networks; Ellipsoids; Speech recognition;

机译：声学;特征提取;训练;数据库;神经网络;椭球体;语音识别;

相似文献

外文文献
中文文献
专利

1. A new deep neural network based on a stack of single-hidden-layer feedforward neural networks with randomly fixed hidden neurons [J] . Hu Junying, Zhang Jiangshe, Zhang Chunxia, Neurocomputing . 2016,第JANa1期

机译：一种新的深度神经网络，该网络基于具有随机固定的隐藏神经元的单层前馈神经网络的堆栈
2. DEEP NEURAL NETWORKS FOR IRIS RECOGNITION SYSTEM BASED ON VIDEO: STACKED SPARSE AUTO ENCODERS (SSAE) AND BI-PROPAGATION NEURAL NETWORK MODELS [J] . ASAMA KUDER NSEAF, AZIZAH JAAFAR, KHIDER NASSIF JASSIM, Journal of Theoretical and Applied Information Technology . 2016,第2期

机译：基于视频的虹膜识别系统深层神经网络：堆叠稀疏自动编码器（SSAE）和双向传播神经网络模型
3. Stacking-Based Deep Neural Network: Deep Analytic Network for Pattern Classification [J] . Low Cheng-Yaw, Park Jaewoo, Teoh Andrew Beng-Jin Cybernetics, IEEE Transactions on . 2020,第12期

机译：基于堆叠的深神经网络：模式分类深层分析网络
4. Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion [C] . Xurong Xie, Xunying Liu, Tan Lee, International Symposium on Chinese Spoken Language Processing . 2018

机译：堆叠深神经网络和混合密度网络对声学对剖反的研究
5. Facies Modeling Using 3D Pre-Stack Simultaneous Seismic Inversion and Multi-Attribute Probability Neural Network Transform in the Wattenberg Field, Colorado [D] . Harryandi, Sheila. 2017

机译：使用3D预堆叠同步地震反演和多属性概率神经网络变换在瓦登伯格域，科罗拉多州的面部建模
6. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks [O] . Md Hossain Shuvo, Sutanu Bhattacharya, Debswapna Bhattacharya -1

机译：QDeep：使用堆叠式深度残留神经网络通过残基级整体误差分类进行基于距离的蛋白质模型质量评估
7. Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. [O] . Richmond Korin 2001

机译：混合密度网络，人类发音数据和连续语音的语音到语音的倒置。

Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅