首页> 外文会议>2018 13th IEEE International Conference on Automatic Face amp; Gesture Recognition >Energy and Computation Efficient Audio-Visual Voice Activity Detection Driven by Event-Cameras

Energy and Computation Efficient Audio-Visual Voice Activity Detection Driven by Event-Cameras


获取原文并翻译 | 示例


We propose a novel method for computationally efficient audio-visual voice activity detection (VAD) where visual temporal information is provided by an energy efficient event-camera (EC). Unlike conventional cameras, ECs perform on-chip low-power pixel-level change detection, adapting the sampling frequency to the dynamics of the activity in the visual scene and removing redundancy, hence enabling energy and computational efficiency. In our VAD pipeline, first, lip activity is located and detected jointly by a probabilistic estimation after spatio-temporal filtering. Then, over the lips, a feather-weight speech-related lip motion detection is performed with minimum false negative rate to activate a highly accurate but expensive acoustic deep neural networks-based VAD. Our experiments show that ECs are accurate at detecting and locating lip activity; and EC-driven VAD can result in considerable savings in computations as well as can substantially reduce false positive rates in low acoustic signal-to-noise ratio conditions.
机译:我们提出了一种计算效率高的视听语音活动检测(VAD)的新颖方法,其中视觉时间信息是由节能事件相机(EC)提供的。与传统相机不同,EC可以执行片上低功耗像素级变化检测,使采样频率适应视觉场景中活动的动态,并消除冗余,从而提高了能源和计算效率。在我们的VAD管道中,首先,通过时空滤波后通过概率估计共同定位和检测嘴唇活动。然后,在嘴唇上以最小的假阴性率执行与羽毛重量有关的嘴唇运动检测,以激活高度准确但昂贵的基于声学深层神经网络的VAD。我们的实验表明,EC在检测和定位嘴唇活动方面是准确的。 EC驱动的VAD可以节省大量计算,并可以在低声信噪比条件下大幅降低误报率。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号