Dynamic Facial Dataset Capture and Processing for Visual Speech Recognition using an RGB-D Sensor

Naveed Ahmed; Mohammed Lataifeh; Imran Junejo

首页> 外文期刊>IAENG Internaitonal journal of computer science >Dynamic Facial Dataset Capture and Processing for Visual Speech Recognition using an RGB-D Sensor

【24h】

Dynamic Facial Dataset Capture and Processing for Visual Speech Recognition using an RGB-D Sensor

机译：使用RGB-D传感器的可视语音识别动态面部数据集捕获和处理

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work presents a new RGB-D acquisition system to capture a comprehensive dynamic facial dataset that can be used for visual speech recognition. The RGB-D facial dataset acquisition system uses a Kinect to record detailed facial features of a person. The dynamic facial dataset is comprised of the facial data of 20 individuals saying 20 common English words or phrases. The acquisition system employs Kinect facial tracking, which records a large number of dynamic facial features. These features include: facial points, facial outline, RGB data, depth data, mapping between RGB and depth data, facial animation units, facial shape units, and finally 2D and 3D face representations of the face along with the 3D head orientation. The effectiveness of acquired RGB-D dynamic facial dataset is demonstrated by presenting a new visual speech recognition method that employs three-dimensional spatiotemporal data of different facial feature points. A number of visual speech recognition methods from the literature are also tested on the new dataset and they obtain a comparable or favorable visual speech recognition results. The results demonstrate the effectiveness of the proposed RGB-D dynamic facial dataset and show that it can be effectively employed in a visual speech recognition system.

机译：这项工作提出了一个新的RGB-D采集系统，可以捕获一个可用于可视语音识别的全面动态面部数据集。 RGB-D面部数据集采集系统使用Kinect来记录一个人的详细面部特征。动态面部数据集包括20个个人的面部数据，称为20个常见的英语单词或短语。采集系统采用Kinect面部跟踪，记录了大量的动态面部特征。这些特征包括：面部点，面部轮廓，RGB数据，深度数据，RGB和深度数据之间的映射，面部动画单元，面部形状单元，以及最后2D和3D面向3D头部方向的2D和3D面部表示。通过呈现采用不同面部特征点的三维时空数据的新视觉语音识别方法来证明所获取的RGB-D动态面部数据集的有效性。来自文献的许多可视语音识别方法也在新数据集上进行测试，并且它们获得了可比或有利的视觉语音识别结果。结果证明了所提出的RGB-D动态面部数据集的有效性，并表明它可以在视觉语音识别系统中有效地使用。

著录项

来源
《IAENG Internaitonal journal of computer science》 |2020年第2期|786-791|共6页
作者
Naveed Ahmed; Mohammed Lataifeh; Imran Junejo;
展开▼
作者单位

Department of Computer Science University of Sharjah Sharjah 27272 UAE;

Department of Computer Science University of Sharjah Sharjah 27272 UAE;

College of Technological Innovation Zayed University UAE;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
RGB-D; Kinect; Facial Dataset; Visual Speech Recognition; Facial Tracking;

机译：RGB-D;kinect;面部数据集;视觉语音识别;面部跟踪;

相似文献

外文文献
中文文献
专利

1. End-to-end visual speech recognition for small-scale datasets [J] . Petridis Stavros, Wang Yujiang, Ma Pingchuan, Pattern recognition letters . 2020,第Mara期

机译：小型数据集的端到端视觉语音识别
2. The self-advantage in visual speech processing enhances audiovisual speech recognition in noise [J] . Tye-Murray Nancy, Spehar Brent P., Myerson Joel, Psychonomic bulletin & review . 2015,第4期

机译：视觉语音处理的自身优势增强了噪声中的视听语音识别
3. Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions [J] . Simon Alexanderson, Jonas Beskow Computer speech and language . 2014,第2期

机译：伦巴第动画语音：在不利条件下产生的动作捕捉，面部动画和语音的视觉清晰度
4. RGB-D dynamic facial dataset capture for visual speech recognition [C] . Naveed Ahmed International Conference on Image and Video Processing, and Artificial Intelligence . 2019

机译：RGB-D动态面部数据集捕获可视语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Adaptive user interface design and analysis using emotion recognition through facial expressions and body posture from an RGB-D sensor [O] . Selma Medjden, Naveed Ahmed, Mohammed Lataifeh 2020

机译：使用RGB-D传感器的面部表情和身体姿势使用情感识别的自适应用户界面设计和分析
7. Empirical assessment of a RGB-D sensor on motion capture and action recognition for construction worker monitoring [O] . SangUk Han, Madhav Achar, SangHyun Lee, 2013

机译：对RGB-D传感器进行运动捕捉和动作识别以进行建筑工人监控的经验评估

Dynamic Facial Dataset Capture and Processing for Visual Speech Recognition using an RGB-D Sensor

摘要

著录项

相似文献

相关主题

期刊订阅