首页> 外文期刊>Science Journal of Circuits, Systems and Signal Processing >A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
【24h】

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

机译:分析频谱图图像时域特征的语音语音清浊分类方法

获取原文
           

摘要

This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero-crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.
机译:通过分析语音信号的两个声学特征,提出了语音信号的有声/无声分类算法。短时能量和短时过零率是语音信号最明显的时域特征之一,可以将其语音活动分类为有声/无声段。提出了一种新思想,其中使用频谱图图像在窄带语音信号中进行逐帧处理。短时能量(STE)和短时过零率(ZCR)这两个时域特征用于对其有声/无声部分进行分类。在第一阶段,分析频谱图的每个帧被分为三个单独的子带,并检查它们的短时能量比模式。然后,使用能量比模式匹配查找表对发声活动进行分类。但是,此方法成功地将模式1到4进行了分类,但是在查找表中的其余模式中却失败了。因此,在第二阶段中确认其余的模式,在第二阶段中,将帧方向的短时平均零交叉率与阈值进行比较。在这项研究中,阈值是根据白高斯噪声(wGn)的短时平均过零率计算的。在不同信噪比(SNR)下,使用男性和女性语音波形评估了该方法的准确性。实验结果表明,与文献中的常规方法相比,该方法具有更高的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号