首页> 外文期刊>Multimedia Tools and Applications >Visual speech recognition for small scale dataset using VGG16 convolution neural network
【24h】

Visual speech recognition for small scale dataset using VGG16 convolution neural network

机译:使用VGG16卷积神经网络的小规模数据集的视觉语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

Visual speech recognition is a method that comprehends speech from speakers lip movements and the speech is validated only by the shape and lip movement. Implementation of this practice not only helps people with hearing impaired but also can be used for professional lip reading whose application can be seen in crime and forensics. It plays a crucial role in aforementioned domains, as normal person's speech will be converted to text. Here, it is proposed to enhance the visual speech recognition technique from the video. The dataset was created and the same was used for implementation and verification. The aim of the approach was to recognize words only from the lip movement using video in the absence of audio and this mostly helps to extract words from a video without audio that helps in forensic and crime analysis. The proposed method employs VGG16 pre trained Convolutional Neural Network architecture for classification and recognition of data. It was observed that the visual modality improves the performance of speech recognition system. Finally, the obtained results were compared with the Hahn Convolutional Neural Network architecture (HCNN). The accuracy of the recommended model is 76% in visual speech recognition.
机译:视觉语音识别是一种理解来自扬声器唇部运动的语音的方法,并且仅通过形状和唇部运动来验证语音。这种做法的实施不仅有助于听力受损,而且可以用于专业的唇部阅读,其应用可以在犯罪和取证中看到。它在上述域中发挥着至关重要的作用,因为普通人的演讲将被转换为文本。这里,提出从视频中增强视觉语音识别技术。数据集是创建的,并且使用相同的实现和验证。这种方法的目的是仅在没有音频的情况下使用视频的唇部运动来识别单词,这主要有助于从没有音频的视频中提取单词,有助于取消犯罪分析。该方法采用VGG16预训练的卷积神经网络架构,用于分类和识别数据。观察到视觉模型提高了语音识别系统的性能。最后,将获得的结果与哈姆卷积神经网络架构(HCNN)进行比较。视觉语音识别的推荐模型的准确性为76%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号