Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the GRB Scale

Arias-Londono Julian D.; Gomez-Garcia Jorge A.; Godino-Llorente Juan I

首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the GRB Scale

【24h】

Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the GRB Scale

机译：使用GRB规模自动评估语音质量的多模式和多输出深度学习架构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article addresses the automatic assessment of voice quality according to the GRB scale, based on the use of a variety of deep learning architectures for prediction purposes. The proposed architectures are multimodal, because they employ multiples sources of information; and also multi-output, because they simultaneously predict all the traits of the GRB scale. A feature engineering approach is followed, based on the use of deep neural networks and a set of well-established features such as MFCC, perturbation and complexity characteristics. Likewise, a representation learning is considered, using convolutional neural networks feed on modulation spectra extracted from voices. Finally, diverse loss functions are also investigated, including two surrogate ordinal classification, a conventional weighed categorical cross-entropy, and a mean square error function. Experiments are carried out in a dataset containing registers of the sustained phonation of three vowels. The best deep learning architecture provides a relative performance improvement of 6.25% for G, 14.1% for R and 18.1% for B, in comparison with recently published results using the same dataset.

机译：本文根据GRB规模解决了语音质量的自动评估，基于各种深度学习架构以进行预测目的。拟议的架构是多式联的，因为它们采用了倍数信息来源;并且还有多输出，因为它们同时预测了GRB规模的所有特征。遵循专题工程方法，基于使用深神经网络的使用和一组良好的特征，例如MFCC，扰动和复杂性特征。同样地，考虑了一种表示学习，在从语音中提取的调制光谱上使用卷积神经网络馈送。最后，还研究了不同的损失功能，包括两个替代序数分类，传统称重的分类交叉熵，以及均方误差函数。实验在包含三个元音的持续发声的寄存器的数据集中进行。与最近发布的结果使用相同数据集的最近发布结果相比，最好的深度学习架构为G，R和18.1％提供了6.25％的相对性能提高。

著录项

来源
《Selected Topics in Signal Processing, IEEE Journal of》 |2020年第2期|413-422|共10页
作者
Arias-Londono Julian D.; Gomez-Garcia Jorge A.; Godino-Llorente Juan I;
展开▼
作者单位

Univ Antioquia Dept Syst Engn Medellin 050010 Colombia;

Univ Politecn Madrid Bioengn & Optoelect Lab ByO Madrid 28031 Spain;

Univ Politecn Madrid Ctr Biomed Technol Madrid 28031 Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature extraction; Mel frequency cepstral coefficient; Machine learning; Neural networks; Perturbation methods; Complexity theory; Correlation; Automatic voice quality analysis; perceptual voice assessment; GRB scale; deep neural networks;

机译：特征提取;MEL频率谱系统;机器学习;神经网络;扰动方法;复杂性理论;相关;自动语音质量分析;感知语音评估;GRB规模;深神经网络;

相似文献

外文文献
中文文献
专利

1. Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale [J] . Wang Zhijian, Yu Ping, Yan Nan, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：基于GRBAS量表的多维声学分析自动评估病理语音质量
2. Deep Multimodality Learning for UAV Video Aesthetic Quality Assessment [J] . Kuang Qi, Jin Xin, Zhao Qinping, IEEE transactions on multimedia . 2020,第10期

机译：UAV视频审查的深度多模态学习
3. Automatic building change image quality assessment in high resolution remote sensing based on deep learning [J] . Huang Fenghua, Yu Ying, Feng Tinghao Journal of visual communication & image representation . 2019,第Auga期

机译：基于深度学习的高分辨率遥感自动建筑物变化图像质量评估
4. Emotion Inferring from Large-scale Internet Voice Data: A Multimodal Deep Learning Approach [C] . Suping Zhou, Jia Jia, Yanfeng Wang, 2018 First Asian Conference on Affective Computing and Intelligent Interaction . 2018

机译：从大规模互联网语音数据推断情感：一种多模式深度学习方法
5. Students Developing Voices in New Learning Ecologies: Voice, Identity, Position and Function as a Framework to Support Multimodal Investigations of Learning Mathematics over Multiple Timescales [D] . El Chidiac, Fady 2018

机译：学生在新的学习生态学中开发声音：语音，身份，职位和功能，作为支持多级时间尺度学习数学研究的框架
6. Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach [O] . Maria Habib, Mohammad Faris, Raneem Qaddoura, 2021

机译：对基于语音的远程医疗咨询的自动质量评估：深入学习方法
7. Deep Multimodality Learning for UAV Video Aesthetic Quality Assessment [O] . Qi Kuang, Xin Jin, Qinping Zhao, 2020

机译：UAV视频审查的深度多模态学习

Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the GRB Scale

摘要

著录项

相似文献

相关主题

期刊订阅