首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks
【24h】

Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks

机译:使用深度神经网络学习语音表达和年龄/性别识别的话语级表示

获取原文

摘要

Accurately recognizing speaker emotion and age/gender from speech can provide better user experience for many spoken dialogue systems. In this study, we propose to use deep neural networks (DNNs) to encode each utterance into a fixed-length vector by pooling the activations of the last hidden layer over time. The feature encoding process is designed to be jointly trained with the utterance-level classifier for better classification. A kernel extreme learning machine (ELM) is further trained on the encoded vectors for better utterance-level classification. Experiments on a Mandarin dataset demonstrate the effectiveness of our proposed methods on speech emotion and age/gender recognition tasks.
机译:从语音中准确识别说话人的情绪和年龄/性别可以为许多口语对话系统提供更好的用户体验。在这项研究中,我们建议使用深度神经网络(DNN)通过合并最后隐藏层随时间的激活来将每个话语编码为固定长度矢量。将特征编码过程设计为与发声级分类器共同训练,以实现更好的分类。内核极限学习机(ELM)进一步在编码矢量上进行了训练,以实现更好的发声级分类。在普通话数据集上进行的实验证明了我们提出的方法在语音情感和年龄/性别识别任务上的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号