首页> 外文会议>IEEE International Conference on Acoustics, Speech, and Signal Processing >REAL-LIFE VOICE ACTIVITY DETECTION WITH LSTM RECURRENT NEURAL NETWORKS AND AN APPLICATION TO HOLLYWOOD MOVIES
【24h】

REAL-LIFE VOICE ACTIVITY DETECTION WITH LSTM RECURRENT NEURAL NETWORKS AND AN APPLICATION TO HOLLYWOOD MOVIES

机译:使用LSTM经常性神经网络和应用于好莱坞电影的现实语音活动检测

获取原文

摘要

A novel, data-driven approach to voice activity detection is presented. The approach is based on Long Short-Term Memory Recurrent Neural Networks trained on standard RASTA-PLP frontend features. To approximate real-life scenarios, large amounts of noisy speech instances are mixed by using both read and spontaneous speech from the TIMIT and Buckeye corpora, and adding real long term recordings of diverse noise types. The approach is evaluated on unseen synthetically mixed test data as well as a real-life test set consisting of four full-length Hollywood movies. A frame-wise Equal Error Rate (EER) of 33.2% is obtained for the four movies and an EER of 9.6% is obtained for the synthetic test data at a peak SNR of 0 dB, clearly outperforming three state-of-the-art reference algorithms under the same conditions.
机译:提出了一种新颖的数据驱动的语音活动检测方法。该方法是基于长期内存经常性神经网络,训练在标准RASTA-PLP前端特征上。为了近似真实生活场景,通过使用来自Timit和Buckeye Corpora的读取和自发的语音来混合大量的嘈杂的语音实例,并添加了不同噪声类型的实际长期录制。该方法是在看不见的综合混合测试数据以及由四部全长好莱坞电影组成的现实测试集。为四部电影获得33.2%的帧型相等误差率(eer),并且在0 dB的峰值SNR处获得9.6%的eer,显然优于三种最先进的在相同条件下的参考算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号