首页> 外文会议>International Conference on Automatic Face and Gesture Recognition >Are You Speaking: Real-Time Speech Activity Detection via Landmark Pooling Network
【24h】

Are You Speaking: Real-Time Speech Activity Detection via Landmark Pooling Network

机译:你在说:通过地标池网络进行实时语音活动检测

获取原文

摘要

In this paper, we propose a novel visual information based framework to solve the real-time speech activity detection problem. Unlike conventional methods which commonly use the audio signal as input, our approach incorporates facial information into a deep neural network for feature learning. Instead of using the whole input image, we further develop a novel end-to-end landmark pooling network to act as an attention-guide scheme to help the deep neural network only focus the related portion of the input image. This helps the network to precisely and efficiently learn highly discriminative features for speech activities. What's more, we implement a recurrent neural network with the gated recurrent unit scheme to make use of the sequential information from video to produce the final decision. To give a comprehensive evaluation of the proposed method, we collect a large-scale dataset from unconstrained speech activities, which consists of a large number of speech/non-speech video sequences under various kinds of degradation. Experimental results demonstrate the superiority of our proposed pipeline over previous approach in terms of performance and efficiency.
机译:在本文中,我们提出了一种基于新的视觉信息的框架来解决实时语音活动检测问题。与通常使用音频信号作为输入的传统方法不同,我们的方法将面部信息纳入了一个用于特征学习的深神经网络。我们还进一步开发了一种新的端到端地标汇集网络,以充当注意力指导方案,以帮助深神经网络仅聚焦输入图像的相关部分。这有助于网络精确有效地学习语音活动的高度辨别特征。更重要的是,我们利用所通用的经常性单元方案实施经常性神经网络,以利用来自视频的顺序信息来产生最终决定。为了对所提出的方法进行全面评估,我们收集了来自不受约束的语音活动的大规模数据集,其中包括在各种劣化下的大量语音/非语音视频序列。实验结果表明,在性能和效率方面,我们提出了先前方法的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号