Visual speech recognition for small scale dataset using VGG16 convolution neural network

Shashidhar R.; Patilkulkarni Sudarshan

首页> 外文期刊>Multimedia Tools and Applications >Visual speech recognition for small scale dataset using VGG16 convolution neural network

【24h】

Visual speech recognition for small scale dataset using VGG16 convolution neural network

机译：使用VGG16卷积神经网络的小规模数据集的视觉语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual speech recognition is a method that comprehends speech from speakers lip movements and the speech is validated only by the shape and lip movement. Implementation of this practice not only helps people with hearing impaired but also can be used for professional lip reading whose application can be seen in crime and forensics. It plays a crucial role in aforementioned domains, as normal person's speech will be converted to text. Here, it is proposed to enhance the visual speech recognition technique from the video. The dataset was created and the same was used for implementation and verification. The aim of the approach was to recognize words only from the lip movement using video in the absence of audio and this mostly helps to extract words from a video without audio that helps in forensic and crime analysis. The proposed method employs VGG16 pre trained Convolutional Neural Network architecture for classification and recognition of data. It was observed that the visual modality improves the performance of speech recognition system. Finally, the obtained results were compared with the Hahn Convolutional Neural Network architecture (HCNN). The accuracy of the recommended model is 76% in visual speech recognition.

机译：视觉语音识别是一种理解来自扬声器唇部运动的语音的方法，并且仅通过形状和唇部运动来验证语音。这种做法的实施不仅有助于听力受损，而且可以用于专业的唇部阅读，其应用可以在犯罪和取证中看到。它在上述域中发挥着至关重要的作用，因为普通人的演讲将被转换为文本。这里，提出从视频中增强视觉语音识别技术。数据集是创建的，并且使用相同的实现和验证。这种方法的目的是仅在没有音频的情况下使用视频的唇部运动来识别单词，这主要有助于从没有音频的视频中提取单词，有助于取消犯罪分析。该方法采用VGG16预训练的卷积神经网络架构，用于分类和识别数据。观察到视觉模型提高了语音识别系统的性能。最后，将获得的结果与哈姆卷积神经网络架构（HCNN）进行比较。视觉语音识别的推荐模型的准确性为76％。

著录项

来源
《Multimedia Tools and Applications》 |2021年第19期|28941-28952|共12页
作者
Shashidhar R.; Patilkulkarni Sudarshan;
展开▼
作者单位

JSS Sci & Technol Univ Dept Elect & Commun Engn Mysuru 570006 India;

JSS Sci & Technol Univ Dept Elect & Commun Engn Mysuru 570006 India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visual speech recognition; Lip-reading; Convolutional neural network; VGG16;

机译：视觉语音识别;唇读;卷积神经网络;VGG16;

相似文献

外文文献
中文文献
专利

1. Training Convolutional Neural Network for Sketch Recognition on Large-Scale Dataset [J] . Zhou Wen, Jia Jinyuan The international arab journal of information technology . 2020,第1期

机译：培训卷积神经网络，用于在大型数据集中素描识别
2. Convolutional neural networks based on multi-scale additive merging layers for visual smoke recognition [J] . Yuan Feiniu, Zhang Lin, Wan Boyang, Machine Vision and Applications . 2019,第2期

机译：基于多尺度累加合并层的卷积神经网络用于视觉烟雾识别
3. End-to-end visual speech recognition for small-scale datasets [J] . Petridis Stavros, Wang Yujiang, Ma Pingchuan, Pattern recognition letters . 2020,第Mara期

机译：小型数据集的端到端视觉语音识别
4. Face Recognition Using Light-Convolutional Neural Networks Based On Modified Vgg16 Model [C] . Anugrah Bintang Perdana, Adhi Prahara International Conference of Computer Science and Information Technology . 2019

机译：基于改进的Vgg16模型的光卷积神经网络人脸识别
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network [O] . Seung Seog Han, Gyeong Hun Park, Woohyung Lim, -1

机译：深度神经网络在灰指甲诊断中显示出与皮肤科医生相当且通常优于皮肤病的性能：通过基于区域的卷积深度神经网络自动构建灰指甲数据集
7. Visual Speech Recognition using VGG16 Convolutional Neural Network [O] . Shashidhar R, S Patilkulkarni, Nishanth S Murthy 2021

机译：使用VGG16卷积神经网络的视觉语音识别

Visual speech recognition for small scale dataset using VGG16 convolution neural network

摘要

著录项

相似文献

相关主题

期刊订阅