Speech Activity Detection Using a Fusion of Dense Convolutional Network in the Movie Audio

机译：电影音频中密集卷积网络融合的语音活动检测

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Speech activity detection (SAD) is a critical preparation process for speech-based applications. The speech activity detection is used to identify the speech in an audio recording. This paper aims to propose a speech activity detection on the entertainment media domain based on CNN. The fusion of two Dense Convolutional Network (DenseNet) with different feature extraction by using Dempster-Shafer theory (DS theory) was used to classify the speech segment. We combined acoustic features, which are the logmel spectrogram (LM), mel frequency cepstral coefficient (MFCC), chroma, spectral contrast, and tonnetz as the input feature. The combination of acoustic features operates on the raw speech signal and delivers it into a convolution neural network for classifying the speech. The result in this work shows that the proposed speech activity detection can achieve better performance (+1% Accuracy, +8% Precision, and +5% F1 score) than previous work in a more complicated noise environment.

机译：语音活动检测（SAD）是基于语音的应用程序的关键准备过程。语音活动检测用于识别录音中的语音。本文旨在提出一种基于CNN的娱乐媒体领域的语音活动检测。利用Dempster-Shafer理论（DS理论）对具有不同特征提取的两个密集卷积网络（DenseNet）进行融合，对语音片段进行分类。我们组合了声学特征，即对数梅尔频谱图（LM），梅尔频率倒谱系数（MFCC），色度，频谱对比度和tonnetz作为输入特征。声学特征的组合对原始语音信号进行操作，并将其传递到卷积神经网络中以对语音进行分类。这项工作的结果表明，在较复杂的噪声环境中，所提出的语音活动检测可以比以前的工作获得更好的性能（+ 1％的准确性，+ 8％的精度和+ 5％的F1分数）。

著录项

来源
《International Technical Conference on Circuits/Systems, Computers and Communications》|2020年|9-14|共6页
会议地点
作者
Pantid Chantangphol; Sasiporn Usanavasin; Jessada Karnjana; Surasak Boonkla; Suthum Keerativittayanun; Anocha Rugchatjaroen; Takahiro Shinozaki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Motion pictures; Feature extraction; Media; Entertainment industry; Voice activity detection; Acoustics;

机译：电影;特征提取;媒体;娱乐业;声音活动检测;声学;

相似文献

外文文献
中文文献
专利

1. Modified dense convolutional networks based emotion detection from speech using its paralinguistic features [J] . Dhiman Ritika, Kang Gurkanwal Singh, Gupta Varun Multimedia Tools and Applications . 2021,第21a23期

机译：基于语言的言语特征改进了密集的卷积网络的情绪检测
2. Dense construction vehicle detection based on orientation-aware feature fusion convolutional neural network [J] . Automation in construction . 2020,第Apra期

机译：基于方向感知特征融合卷积神经网络的稠密工程车辆检测
3. Detection and separation of speech event using audio and video information fusion and its application to robust speech interface [J] . Asano F, Yamamoto K, Hara I, EURASIP journal on applied signal processing . 2004,第11期

机译：利用音视频信息融合检测和分离语音事件及其在鲁棒语音接口中的应用
4. Robust Speech Activity Detection in Movie Audio: Data Resources and Experimental Evaluation [C] . Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：电影音频中强大的语音活动检测：数据资源和实验评估
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. Truncating a densely connected convolutional neural network with partial layer freezing and feature fusion for diagnosing COVID-19 from chest X-rays [O] . Francis Jesmar P. Montalbo 2021

机译：通过部分层冻结致密连接的卷积神经网络具有用于诊断Covid-19免受胸部X射线诊断的特征融合
7. Audio Event Detection in Movies using Multiple Audio Words and Contextual Bayesian Networks [O] . Penet, Cédric, Demarty, Claire-Hélène, Gravier, Guillaume, 2013

机译：使用多个音频词和上下文贝叶斯网络的电影中音频事件检测

Speech Activity Detection Using a Fusion of Dense Convolutional Network in the Movie Audio

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅