This study is focused on an unsupervised approach for detection of human scream vocalizations from continuous recordings in noisy acoustic environments. The proposed detection solution is based on compound segmentation, which employs weighted mean distance, T-statistics and Bayesian Information Criteria for detection of screams. This solution also employs an unsupervised threshold optimized Combo-SAD for removal of non-vocal noisy segments in the preliminary stage. A total of five noisy environments were simulated for noise levels ranging from −20dB to +20dB for five different noisy environments. Performance of proposed system was compared using two alternative acoustic front-end features (i) Mel-frequency cepstral coefficients (MFCC) and (ii) perceptual minimum variance distortionless response (PMVDR). Evaluation results show that the new scream detection solution works well for clean, +20, +10 dB SNR levels, with performance declining as SNR decreases to −20dB across a number of the noise sources considered.
展开▼
机译:这项研究的重点是从嘈杂的声学环境中连续录制的声音中检测人类尖叫声的无监督方法。所提出的检测解决方案基于复合分割,该复合分割采用加权平均距离,T统计和贝叶斯信息准则来检测尖叫。该解决方案还采用了无监督阈值优化的Combo-SAD,用于在初始阶段去除非语音噪声段。对于五个不同的嘈杂环境,总共模拟了五个嘈杂环境,其噪声水平范围为−20dB到+ 20dB。使用两个可选的声学前端功能(i)梅尔频率倒谱系数(MFCC)和(ii)感知最小方差无失真响应(PMVDR)比较了所提出系统的性能。评估结果表明,新的尖叫检测解决方案适用于干净的+ 20,+ 10 dB SNR水平,并且在考虑的多种噪声源中,当SNR降低至−20dB时,性能会下降。
展开▼