Multimodal Speaker Adaptation of Acoustic Model and Language Model for Asr Using Speaker Face Embedding

机译：基于说话人面部嵌入的声态和语言模型多模态说话人自适应

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present an investigation into the adaptation of the acoustic model and the language model for automatic speech recognition (ASR) using speaker face for transcription of a multimedia dataset. We begin by overviewing relevant previous work on the integration of visual signals into ASR systems. Our experimental investigation shows a small improvement in word error rate (WER) for the transcription of a collection of instruction videos using adaptation of the acoustic model and the language model with fixed-length face embedding vectors. We also present potential approaches to integrating human facial information, and body gestures into ASR as further directions for research on this topic.

机译：我们目前针对使用说话人面部进行多媒体数据集转录的自动语音识别（ASR）声学模型和语言模型的适应性进行调查。我们首先概述有关将视觉信号集成到ASR系统中的相关先前工作。我们的实验研究表明，使用声学模型和语言模型（带有固定长度的面部嵌入矢量）进行自适应，可以将指令视频集合的转录在单词错误率（WER）方面有所改善。我们还提出了将人类面部信息和身体手势整合到ASR中的潜在方法，作为有关此主题的进一步研究方向。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|8643-8647|共5页
会议地点
作者
Yasufumi Moriya; Gareth J. F. Jones;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
acoustic signal processing; face recognition; multimedia computing; speaker recognition; video signal processing;

机译：声音信号处理人脸识别多媒体计算说话人识别视频信号处理;
入库时间 2022-08-26 14:46:03

相似文献

外文文献
中文文献
专利

1. An Experimental Comparison of Modeling Techniques and Combination of Speaker - Specific Information from Different Languages for Multilingual Speaker Identification [J] . H. S. Jayanna, B. G. Nagaraja Journal of Intelligent Systems . 2016,第4期

机译：多种语言的说话人识别的建模技术和来自不同语言的说话人特定信息组合的实验比较
2. Adaptation of Acoustic Models in Joint Speaker and Noise Space Using Bilinear Models [J] . Yongwon JEONG, Hyung Soon KIM IEICE transactions on information and systems . 2014,第8期

机译：使用双线性模型的联合说话人和噪声空间中的声学模型的适应
3. Phonetic perceptual identification by native- and second-language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved with articulatory-auditory/orosensory internal models. [J] . Callan DE, Jones JA, Callan AM, NeuroImage . 2004,第3期

机译：通过母语和第二语言的说话人进行的语音感知识别会差异性地激活涉及声学语音处理的大脑区域以及涉及发音听觉/口感内部模型的大脑区域。
4. Multimodal Speaker Adaptation of Acoustic Model and Language Model for Asr Using Speaker Face Embedding [C] . Yasufumi Moriya, Gareth J. F. Jones IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：使用扬声器脸部嵌入的ASR声学模型和语言模型的多模式扬声器调整
5. Speaker Characteristic-based Acoustic Model Adaptation Method for Speaker Recognition Systems [D] . Millington, Daniel S. 2011

机译：基于说话者特征的说话人识别系统声学模型自适应方法
6. A model of contact-induced language change: Testing the role of second language speakers in the evolution of Mozambican Portuguese [O] . Anna Jon-And, Elliot Aguilar 2012

机译：接触引起的语言变化的模型：测试第二语言使用者在莫桑比克葡萄牙语演变中的作用
7. DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis [O] . Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari 2019

机译：基于DNN的扬声器使用主观讲话者相似性，用于语音合成中的多扬声器建模
8. Speaker Adaptation of Language Models for Automatic Dialog Act Segmentation of Meetings [R] . Kolar, J. , Liu, Y. , Shriberg, E. 2007

机译：会议自动对话行为分割的语言模型演讲者自适应

Multimodal Speaker Adaptation of Acoustic Model and Language Model for Asr Using Speaker Face Embedding

摘要

著录项

相似文献

相关主题

期刊订阅