首页> 外文会议>International Technical Conference on Circuits/Systems, Computers and Communications >Speech Activity Detection Using a Fusion of Dense Convolutional Network in the Movie Audio
【24h】

Speech Activity Detection Using a Fusion of Dense Convolutional Network in the Movie Audio

机译:电影音频中密集卷积网络融合的语音活动检测

获取原文
获取外文期刊封面目录资料

摘要

Speech activity detection (SAD) is a critical preparation process for speech-based applications. The speech activity detection is used to identify the speech in an audio recording. This paper aims to propose a speech activity detection on the entertainment media domain based on CNN. The fusion of two Dense Convolutional Network (DenseNet) with different feature extraction by using Dempster-Shafer theory (DS theory) was used to classify the speech segment. We combined acoustic features, which are the logmel spectrogram (LM), mel frequency cepstral coefficient (MFCC), chroma, spectral contrast, and tonnetz as the input feature. The combination of acoustic features operates on the raw speech signal and delivers it into a convolution neural network for classifying the speech. The result in this work shows that the proposed speech activity detection can achieve better performance (+1% Accuracy, +8% Precision, and +5% F1 score) than previous work in a more complicated noise environment.
机译:语音活动检测(SAD)是基于语音的应用程序的关键准备过程。语音活动检测用于识别录音中的语音。本文旨在提出一种基于CNN的娱乐媒体领域的语音活动检测。利用Dempster-Shafer理论(DS理论)对具有不同特征提取的两个密集卷积网络(DenseNet)进行融合,对语音片段进行分类。我们组合了声学特征,即对数梅尔频谱图(LM),梅尔频率倒谱系数(MFCC),色度,频谱对比度和tonnetz作为输入特征。声学特征的组合对原始语音信号进行操作,并将其传递到卷积神经网络中以对语音进行分类。这项工作的结果表明,在较复杂的噪声环境中,所提出的语音活动检测可以比以前的工作获得更好的性能(+ 1%的准确性,+ 8%的精度和+ 5%的F1分数)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号