首页> 外国专利> Morphological pure speech detection using valley percentage

Morphological pure speech detection using valley percentage

机译：使用谷值百分比的形态纯语音检测

页面导航

摘要
著录项
相似文献

摘要

A human speech detection method detects pure-speech signals in an audio signal containing a mixture of pure-speech and non-speech or mixed-speech signals. The method accurately detects the pure-speech signals by computing a novel Valley Percentage feature from the audio signal and then classifying the audio signals into pure-speech and non-speech (or mixed-speech) classifications. The Valley Percentage is a measurement of the low energy parts of the audio signal (the valley) in comparison to the high energy parts of the audio signal (the mountain). To classify the audio signal, the method performs a threshold decision on the value of the Valley Percentage. Using a binary mask, a high Valley Percentage is classified as pure-speech and a low Valley Percentage is classified as non-speech (or mixed-speech). The method further employs morphological filters to improve the accuracy of human speech detection. Before detection, a morphological closing filter may be employed to eliminate unwanted noise from the audio signal. After detection, a combination of morphological closing and opening filters may be employed to remove aberrant pure-speech and non-speech classifications from the binary mask resulting from impulsive audio signals in order to more accurately detect the boundaries between the pure-speech and non-speech portions of the audio signal. A number of parameters may be employed by the method to further improve the accuracy of human speech detection. For implementation in supervised digital audio signal applications, these parameters may be optimized by training the application a priori. For implementation in an unsupervised environment, adaptive determination of these parameters is also possible.

机译：人类语音检测方法检测包含纯语音和非语音或混合语音信号混合的音频信号中的纯语音信号。该方法通过从音频信号中计算新颖的Valley百分比特征，然后将音频信号分类为纯语音和非语音（或混合语音）分类，来准确检测纯语音信号。谷值百分比是对音频信号的低能量部分（谷）与音频信号的高能量部分（峰）的比较。为了分类音频信号，该方法对谷值百分比的值执行阈值决定。使用二进制掩码，高谷值百分比被归类为纯语音，而低谷值百分比被归类为非语音（或混合语音）。该方法还采用形态滤波器来提高人类语音检测的准确性。在检测之前，可以采用形态学闭合滤波器来消除音频信号中的有害噪声。在检测之后，可以使用形态学关闭和打开滤波器的组合，从脉冲音频信号产生的二进制掩码中去除异常的纯语音和非语音分类，以便更准确地检测纯语音和非语音之间的边界。音频信号的语音部分。该方法可以采用许多参数来进一步提高人类语音检测的准确性。为了在有监督的数字音频信号应用中实施，可以通过事先训练应用来优化这些参数。为了在无监督的环境中实施，这些参数的自适应确定也是可能的。

著录项

公开/公告号US6205422B1

专利类型
公开/公告日2001-03-20

原文格式PDF
申请/专利权人 MICROSOFT CORPORATION;
展开▼

申请/专利号US19980201705
发明设计人 CHUANG GU;WEI-GE CHEN;MING-CHIEH LEE;
展开▼

申请日1998-11-30
分类号G10L110/20;G10L110/00;
国家 US
入库时间 2022-08-22 01:04:51

相似文献

专利
外文文献
中文文献