Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition

机译：在大词汇量连续语音识别中利用唇读

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Vast majority of current research in the area of audiovisual speech recognition via lipreading from frontal face videos focuses on simple cases such as isolated phrase recognition or structured speech, where the vocabulary is limited to several tens of units. In this paper, we diverge from these traditional applications and investigate the effect of incorporating the visual information in the task of continuous speech recognition with vocabulary size ranging from several hundred to half a million words. To this end, we evaluate various visual speech parame-trizations, both existing and novel, that are designed to capture different kind of information in the video signal. The experiments are conducted on a moderate sized dataset of 54 speakers, each uttering 100 sentences in Czech language. We show that even for large vocabularies the visual signal contains enough information to improve the word accuracy up to 15% relatively to the acoustic-only recognition.

机译：通过正面面部视频的唇读来进行视听语音识别领域的当前大多数研究都集中在简单的情况下，例如孤立的短语识别或结构化语音，其中词汇仅限于几十个单元。在本文中，我们从这些传统应用中脱颖而出，并研究了将视觉信息纳入词汇量范围从数百到半百万个单词的连续语音识别任务中的效果。为此，我们评估了现有的和新颖的各种视觉语音参数化，这些参数化旨在捕获视频信号中的不同类型的信息。实验是在54位讲者的中等规模数据集上进行的，每个讲者用捷克语讲了100个句子。我们表明，即使对于大词汇量，视觉信号也包含足够的信息，相对于仅使用声音的识别，其字词准确度可提高15％。

著录项

来源
《International Conference on speech and computer》|2017年|767-776|共10页
会议地点
作者
Karel Palecek;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Audiovisual speech recognition; Lipreading; LVCSR;

机译：视听语音识别;唇读轻型滑车;

相似文献

外文文献
中文文献
专利

1. Experimenting with lipreading for large vocabulary continuous speech recognition [J] . Karel Paleček Journal on multimodal user interfaces . 2018,第4期

机译：通过唇读进行大词汇量连续语音识别的实验
2. An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition [J] . Bert Reveil, Kris Demuynck, Jean-Pierre Martens Computer speech and language . 2014,第1期

机译：一种改进的两阶段混合语言模型方法，用于处理大词汇量连续语音识别中的词汇外单词
3. Effect of Vocabulary Extension using Word Sequence Concatenation for Large Vocabulary Continuous Speech Recognition [J] . YOSUKE WADA, NORIHIKO KOBAYASHI, YUICHIRO NAKANO 情報処理学会論文誌 . 1999,第4期

机译：单词序列级联对词汇扩展对大词汇量连续语音识别的影响
4. Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition [C] . Karel Palecek International Conference on Speech and Computer . 2017

机译：利用大型词汇连续语音识别的Lipreading
5. An Error Detection and Correction Framework to Improve Large Vocabulary Continuous Speech Recognition [D] . Zhou, Zhengyu 2009

机译：一种提高大词汇量连续语音识别能力的错误检测与纠正框架
6. Lipreading and Audiovisual Speech Recognition across the Adult Lifespan: Implications for Audiovisual Integration [O] . Nancy Tye-Murray, Brent Spehar, Joel Myerson, -1

机译：成人寿命中的唇读和视听语音识别：对视听整合的启示
7. Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech [O] . Krerksak Likitsupin, Proadpran Punyabukkana, Chai Wutiwiwatchai, 2016

机译：改善基于分段的语音识别大词汇连续语音的声学语音方法

Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅