首页> 外文会议>International Conference on speech and computer >Designing Advanced Geometric Features for Automatic Russian Visual Speech Recognition
【24h】

Designing Advanced Geometric Features for Automatic Russian Visual Speech Recognition

机译:设计用于自动俄语视觉语音识别的高级几何特征

获取原文

摘要

The use of video information plays an increasingly important role for automatic speech recognition. Nowadays, audio-only based systems have reached a certain accuracy threshold and many researchers see a solution to the problem in the use of visual modality to obtain better results. Despite the fact that audio modality of speech is much more representative than video, their proper fusion can improve both quality and robustness of the entire recognition system that was proved in practice by many researches. However, no agreement between researchers on the optimal set of visual features was reached. In this paper, we investigate this issue in more detail and propose advanced geometry-based visual features for automatic Russian lip-reading system. The experiments were conducted using collected HAVRUS audio-visual speech database. The average viseme recognition accuracy of our system trained on the entire corpus is 40.62%. We also tested the main state-of-the-art methods for visual speech recognition, applying them to continuous Russian speech with high-speed recordings (200 frames per seconds).
机译:视频信息的使用对于自动语音识别起着越来越重要的作用。如今,基于音频的系统已经达到了一定的准确性阈值,许多研究人员看到了使用视觉模态以获得更好结果的解决方案。尽管语音的音频方式比视频更具代表性,但它们的适当融合可以提高整个识别系统的质量和鲁棒性,这在许多研究中都得到了实践的证明。但是,研究人员之间没有就最佳视觉特征集达成共识。在本文中,我们将对此问题进行更详细的研究,并为俄罗斯的自动唇读系统提出基于几何的高级视觉功能。实验是使用收集的HAVRUS视听语音数据库进行的。我们在整个语料库上训练的系统的平均视位素识别准确度为40.62%。我们还测试了视觉语音识别的主要最新技术,并将其应用于具有高速录音(每秒200帧)的连续俄语语音中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号