首页> 外文会议>European Signal Processing Conference >Viseme definitions comparison for visual-only speech recognition
【24h】

Viseme definitions comparison for visual-only speech recognition

机译:Viseme定义比较仅用于视觉语音识别

获取原文

摘要

Audio-visual speech recognition (AVSR) involves recognising of what a speaker is uttering using both audio and visual cues. While phonemes, the units of speech in the audio domain, are well documented, this is not equally true for the speech units in the visual domain: visemes. In the literature, only a generic viseme definition is recognised. There is no agreement on what visemes practically imply, and if they are just related to mouth position or mouth movement. In this paper a visual-only speech recognition system is presented, trained using either PCA or optical flow visual features. Recognition rate changes depending on which practical viseme definition has been used. Four viseme definitions were tested and results are analyzed in order to establish which is, within the 4 candidates, the best performing viseme definition.
机译:视听语音识别(AVSR)涉及使用音频和视觉提示识别说话者在说什么。虽然音素是音频域中的语音单位,但有据可查,但对于视觉域中的语音单位:视位素而言,情况却并非如此。在文献中,仅识别通用的视位素定义。对于哪个假牙实际上意味着什么,以及它们是否仅与嘴部位置或嘴部运动有关,目前尚无共识。本文提出了一种仅视觉的语音识别系统,并使用PCA或光流视觉功能对其进行了训练。识别率根据所使用的实际视位素定义而变化。测试了四个视位定义,并对结果进行了分析,以便确定在四个候选值中哪个是最佳的视位定义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号