首页> 外文会议>European Conference on Computer Vision(ECCV 2006) Workshop on Human-Computer Interaction(HCI); 20060513; Graz(AT) >Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm
【24h】

Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

机译:基于小波的多分辨率频谱和支持向量机以及音频混合算法的语音活动检测

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.
机译:本文提出了一种用于多媒体会议的语音活动检测(VAD)算法和有效的语音混合算法。拟议的VAD使用基于小波的多分辨率频谱的MFCC和两个经典音频参数作为音频特征,并通过检测多门零交叉比来预先判断沉默,并通过支持向量机(SVM)对噪声和语音进行分类。会议的多点控制单元(MCU)中使用的新语音混合算法将每个音频流的短时功率作为混合权重向量,并设计用于程序中的并行处理。各种实验表明,提出的VAD算法在所有SNR方面都比G.729b和其他VAD的VAD总体上具有更好的性能,新的语音混合算法的输出音频具有出色的听觉感知能力,并且其计算时间延迟足够小,可以满足真实用户的需求。实时传输,MCU计算低于基于G.729b VAD的计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号