首页> 外文期刊>Optical memory & neural networks >About Neural-Network Algorithms Application in Viseme Classification Problem with Face Video in Audiovisual Speech Recognition Systems
【24h】

About Neural-Network Algorithms Application in Viseme Classification Problem with Face Video in Audiovisual Speech Recognition Systems

机译:关于神经网络算法在视听语音识别系统中带有面部视频的Viseme分类问题中的应用

获取原文
获取原文并翻译 | 示例

摘要

The paper considers the phoneme recognition by facial expressions of a speaker in voice-activated control systems. We have developed a neural network recognition algorithm by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands. The paper presents the experimental results of viseme (facial and lip position corresponding to a particular phoneme) classification of Russian vowels. We show the dependence of the classification accuracy on the used classifier (multilayer feed-forward network, support vector machine, k-nearest neighbor method), image features (histogram of oriented gradients, eigenvectors, SURF local descriptors) and the type of camera (built-in or Kinect one). The best accuracy of speaker-dependent recognition is shown to be 85% for a built-in camera and 96% for Kinect depth maps when the classification is performed with the histogram of oriented gradients and the support vector machine.
机译:本文考虑了语音激活控制系统中说话人面部表情的音素识别。通过使用语音单词解码方法和对语音命令的孤立音节发音的要求,我们开发了一种神经网络识别算法。本文介绍了俄语元音的视位音(面部和嘴唇位置对应于特定音素)分类的实验结果。我们展示了分类精度对所使用的分类器(多层前馈网络,支持向量机,k最近邻方法),图像特征(定向梯度的直方图,特征向量,SURF局部描述符)和相机类型(内置或Kinect之一)。当使用定向梯度直方图和支持向量机进行分类时,对于内置摄像头,说话者相关识别的最佳精度显示为85%,对于Kinect深度图,则为96%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号