Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

机译：堆叠深神经网络和混合密度网络对声学对剖反的研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.

机译：基于声学信号的声学对剖反预测铰接运动对于谈话头，语音识别和教育等许多应用是有用的。基于DNN的技术在该地区实现了最先进的性能。本文调查了不同堆叠网络架构进行声学对关节逆变。可以使用不同类型的辅助特征来连接两级的DNN或混合密度网络（MDNS），包括通过从第一级网络提取的MLPG算法的瓶颈特征，直接产生的特征和预测的铰接特征。对于实验，在MNGU0英语EMA数据库和AIMSL中型EMA数据库中评估使用DNN，延时DNN（TDNNS），RNNS和MDNS的堆叠系统。最后，在使用LSF声学特征的MNGU0数据的默认配置中，使用具有椭圆形差异和MLPG生成的功能的提出的堆叠系统在RMSE中获得0.718mm，其类似于RNN和RNN-MDN BLSTM系统较慢更困难的训练阶段。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2018年|504p|共5页
会议地点
作者
Xurong Xie; Xunying Liu; Tan Lee; Lan Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Acoustics; Feature extraction; Training; Databases; Neural networks; Ellipsoids; Speech recognition;

机译：声学;特征提取;培训;数据库;神经网络;椭圆形;语音识别;

相似文献

外文文献
中文文献
专利

1. A new deep neural network based on a stack of single-hidden-layer feedforward neural networks with randomly fixed hidden neurons [J] . Hu Junying, Zhang Jiangshe, Zhang Chunxia, Neurocomputing . 2016,第JANa1期

机译：一种新的深度神经网络，该网络基于具有随机固定的隐藏神经元的单层前馈神经网络的堆栈
2. DEEP NEURAL NETWORKS FOR IRIS RECOGNITION SYSTEM BASED ON VIDEO: STACKED SPARSE AUTO ENCODERS (SSAE) AND BI-PROPAGATION NEURAL NETWORK MODELS [J] . ASAMA KUDER NSEAF, AZIZAH JAAFAR, KHIDER NASSIF JASSIM, Journal of Theoretical and Applied Information Technology . 2016,第2期

机译：基于视频的虹膜识别系统深层神经网络：堆叠稀疏自动编码器（SSAE）和双向传播神经网络模型
3. Stacking-Based Deep Neural Network: Deep Analytic Network for Pattern Classification [J] . Low Cheng-Yaw, Park Jaewoo, Teoh Andrew Beng-Jin Cybernetics, IEEE Transactions on . 2020,第12期

机译：基于堆叠的深神经网络：模式分类深层分析网络
4. Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion [C] . Xurong Xie, Xunying Liu, Tan Lee, International Symposium on Chinese Spoken Language Processing . 2018

机译：堆叠深层神经网络和混合密度网络用于声-发音反演的研究
5. Facies Modeling Using 3D Pre-Stack Simultaneous Seismic Inversion and Multi-Attribute Probability Neural Network Transform in the Wattenberg Field, Colorado [D] . Harryandi, Sheila. 2017

机译：使用3D预堆叠同步地震反演和多属性概率神经网络变换在瓦登伯格域，科罗拉多州的面部建模
6. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks [O] . Md Hossain Shuvo, Sutanu Bhattacharya, Debswapna Bhattacharya -1

机译：QDeep：使用堆叠式深度残留神经网络通过残基级整体误差分类进行基于距离的蛋白质模型质量评估
7. Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. [O] . Richmond Korin 2001

机译：混合密度网络，人类发音数据和连续语音的语音到语音的倒置。

Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

摘要

著录项

相似文献

相关主题

期刊订阅