首页>
外国专利>
Classification of audio as speech or non-speech using multiple threshold values
Classification of audio as speech or non-speech using multiple threshold values
展开▼
机译:使用多个阈值将音频分类为语音或非语音
展开▼
页面导航
摘要
著录项
相似文献
摘要
A portion of an audio signal is separated into multiple frames from which one or more different features are extracted. These different features are used, in combination with a set of rules, to classify the portion of the audio signal into one of multiple different classifications (for example, speech, non-speech, music, environment sound, silence, etc.). In one embodiment, these different features include one or more of line spectrum pairs (LSPs), a noise frame ratio, periodicity of particular bands, spectrum flux features, and energy distribution in one or more of the bands. The line spectrum pairs are also optionally used to segment the audio signal, identifying audio classification changes as well as speaker changes when the audio signal is speech.
展开▼