Are You Speaking: Real-Time Speech Activity Detection via Landmark Pooling Network

机译：你在说：通过地标池网络进行实时语音活动检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel visual information based framework to solve the real-time speech activity detection problem. Unlike conventional methods which commonly use the audio signal as input, our approach incorporates facial information into a deep neural network for feature learning. Instead of using the whole input image, we further develop a novel end-to-end landmark pooling network to act as an attention-guide scheme to help the deep neural network only focus the related portion of the input image. This helps the network to precisely and efficiently learn highly discriminative features for speech activities. What's more, we implement a recurrent neural network with the gated recurrent unit scheme to make use of the sequential information from video to produce the final decision. To give a comprehensive evaluation of the proposed method, we collect a large-scale dataset from unconstrained speech activities, which consists of a large number of speech/non-speech video sequences under various kinds of degradation. Experimental results demonstrate the superiority of our proposed pipeline over previous approach in terms of performance and efficiency.

机译：在本文中，我们提出了一种基于新的视觉信息的框架来解决实时语音活动检测问题。与通常使用音频信号作为输入的传统方法不同，我们的方法将面部信息纳入了一个用于特征学习的深神经网络。我们还进一步开发了一种新的端到端地标汇集网络，以充当注意力指导方案，以帮助深神经网络仅聚焦输入图像的相关部分。这有助于网络精确有效地学习语音活动的高度辨别特征。更重要的是，我们利用所通用的经常性单元方案实施经常性神经网络，以利用来自视频的顺序信息来产生最终决定。为了对所提出的方法进行全面评估，我们收集了来自不受约束的语音活动的大规模数据集，其中包括在各种劣化下的大量语音/非语音视频序列。实验结果表明，在性能和效率方面，我们提出了先前方法的优势。

著录项

来源
《International Conference on Automatic Face and Gesture Recognition》|2019年|753p|共5页
会议地点
作者
Boyu Wang; Xiaolong Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
audio signal processing; image sequences; learning (artificial intelligence); recurrent neural nets; speech processing; speech recognition; video signal processing;

机译：音频信号处理;图像序列;学习（人工智能）;复发性神经网络;语音处理;语音识别;视频信号处理;

相似文献

外文文献
中文文献
专利

1. Novel Detection Algorithm of Speech Activity and the impact of Speech Codecs on Remote Speaker Recognition System [J] . RIADH AJGOU, SALIM SBAA, SAID GHENDIR, WSEAS Transactions on Signal Processing . 2014,第Pta1期

机译：语音活动的新型检测算法及语音编解码器对远程讲话者识别系统的影响
2. Semi-supervised speech activity detection with an application to automatic speaker verification [J] . Alexey Sholokhov, Md Sahidullah, Tomi Kinnunen Computer speech and language . 2018,第JANa期

机译：半监督语音活动检测及其在自动说话者验证中的应用
3. Speech activity detection and enhancement of a moving speaker based on the wideband generalized likelihood ratio and microphone arrays [J] . Potamitis I, Fishler E The Journal of the Acoustical Society of America . 2004,第4期

机译：基于宽带广义似然比和麦克风阵列的移动扬声器语音活动检测和增强
4. Are You Speaking: Real-Time Speech Activity Detection via Landmark Pooling Network [C] . Boyu Wang, Xiaolong Wang International Conference on Automatic Face and Gesture Recognition . 2019

机译：您在说话吗：通过地标池网络进行实时语音活动检测
5. Landmark detection with surprise saliency using convolutional neural networks [D] . Tang, Feng. 2016

机译：使用卷积神经网络具有惊喜显着性的地标检测
6. A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection [O] . Abhishek Sehgal, Nasser Kehtarnavaz -1

机译：用于实时语音活动检测的卷积神经网络智能手机应用程序
7. Speech Activity and Speaker Novelty Detection Methods for Meeting Processing [O] . Masahide Sugiyama, See Profile, Konstantin Markov, 2016

机译：会议处理的语音活动和说话人新奇检测方法

Are You Speaking: Real-Time Speech Activity Detection via Landmark Pooling Network

摘要

著录项

相似文献

相关主题

期刊订阅