首页> 外文会议>Pacific-rim symposium on image and video technology >Robust Visual Voice Activity Detection Using Long Short-Term Memory Recurrent Neural Network
【24h】

Robust Visual Voice Activity Detection Using Long Short-Term Memory Recurrent Neural Network

机译:使用长短期记忆递归神经网络的鲁棒视觉语音活动检测

获取原文

摘要

Many traditional visual voice activity detection systems utilize features extracted from mouth region images which are sensitive to noisy observations of the visual domain. In addition, hyperparameters of the feature extraction process modulating the desired compromise between robustness, efficiency, and accuracy of the algorithm are difficult to be determined. Therefore, a visual voice activity detection algorithm which only utilizes simple lip shape information as features and a Long Short-Term Memory recurrent neural network (LSTM-RNN) as a classifier is proposed. Face detection is performed by structural SVM based on histogram of oriented gradient (HOG) features. Detected face template is used to initialize a kernelized correlation filter tracker. Facial landmark coordinates are then extracted from the tracked face. Cen-troid distance function is applied to the geometrically normalized landmarks surrounding the outer and inner lip contours. Finally, discriminative (LSTM-RNN) and generative (Hidden Markov Model) methods are used to model the temporal lip shape sequences during speech and non-speech intervals and their classification performances axe compared. Experimental results show that the proposed algorithm using LSTM-RNN can achieve a classification rate of 98% in labeling speech and non-speech periods. It is robust and efficient for realtime applications.
机译:许多传统的视觉语音活动检测系统利用从嘴巴区域图像中提取的特征,这些特征对视觉域的嘈杂观察敏感。另外,难以确定特征提取过程的超参数,其调制算法的鲁棒性,效率和准确性之间的期望折衷。因此,提出了一种视觉语音活动检测算法,该算法仅以简单的唇形信息为特征,并以长短期记忆递归神经网络(LSTM-RNN)作为分类器。通过结构化SVM基于定向梯度(HOG)特征的直方图执行人脸检测。检测到的面部模板用于初始化内核化的相关过滤器跟踪器。然后从被跟踪的面部提取面部界标坐标。中心三叉距函数应用于围绕外部和内部嘴唇轮廓的几何归一化界标。最后,使用判别(LSTM-RNN)和生成(隐马尔可夫模型)方法对语音和非语音间隔中的颞唇形状序列进行建模,并对它们的分类性能进行比较。实验结果表明,该算法在标注语音和非语音期间可以达到98%的分类率。它对于实时应用程序是强大而高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号