首页> 外文会议>International Conference on speech and computer >Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition
【24h】

Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition

机译:在大词汇量连续语音识别中利用唇读

获取原文
获取外文期刊封面目录资料

摘要

Vast majority of current research in the area of audiovisual speech recognition via lipreading from frontal face videos focuses on simple cases such as isolated phrase recognition or structured speech, where the vocabulary is limited to several tens of units. In this paper, we diverge from these traditional applications and investigate the effect of incorporating the visual information in the task of continuous speech recognition with vocabulary size ranging from several hundred to half a million words. To this end, we evaluate various visual speech parame-trizations, both existing and novel, that are designed to capture different kind of information in the video signal. The experiments are conducted on a moderate sized dataset of 54 speakers, each uttering 100 sentences in Czech language. We show that even for large vocabularies the visual signal contains enough information to improve the word accuracy up to 15% relatively to the acoustic-only recognition.
机译:通过正面面部视频的唇读来进行视听语音识别领域的当前大多数研究都集中在简单的情况下,例如孤立的短语识别或结构化语音,其中词汇仅限于几十个单元。在本文中,我们从这些传统应用中脱颖而出,并研究了将视觉信息纳入词汇量范围从数百到半百万个单词的连续语音识别任务中的效果。为此,我们评估了现有的和新颖的各种视觉语音参数化,这些参数化旨在捕获视频信号中的不同类型的信息。实验是在54位讲者的中等规模数据集上进行的,每个讲者用捷克语讲了100个句子。我们表明,即使对于大词汇量,视觉信号也包含足够的信息,相对于仅使用声音的识别,其字词准确度可提高15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号