首页> 外国专利> Classification of audio as speech or non-speech using multiple threshold values

Classification of audio as speech or non-speech using multiple threshold values

机译：使用多个阈值将音频分类为语音或非语音

页面导航

摘要
著录项
相似文献

摘要

A portion of an audio signal is separated into multiple frames from which one or more different features are extracted. These different features are used, in combination with a set of rules, to classify the portion of the audio signal into one of multiple different classifications (for example, speech, non-speech, music, environment sound, silence, etc.). In one embodiment, these different features include one or more of line spectrum pairs (LSPs), a noise frame ratio, periodicity of particular bands, spectrum flux features, and energy distribution in one or more of the bands. The line spectrum pairs are also optionally used to segment the audio signal, identifying audio classification changes as well as speaker changes when the audio signal is speech.

机译：音频信号的一部分被分成多个帧，从中提取一个或多个不同特征。这些不同的功能与一组规则结合使用，可将音频信号的一部分分类为多个不同的分类（例如，语音，非语音，音乐，环境声音，静音等）之一。在一个实施例中，这些不同的特征包括一个或多个线谱对（LSP），噪声帧比，特定频带的周期性，频谱通量特征以及一个或多个频带中的能量分布。线谱对还可以选择用于分割音频信号，从而在音频信号为语音时识别音频分类变化以及扬声器变化。

著录项

公开/公告号US7249015B2

专利类型
公开/公告日2007-07-24

原文格式PDF
申请/专利权人 HAO JIANG;HONG-JIANG ZHANG;
展开▼

申请/专利号US20060276419
发明设计人 HAO JIANG;HONG-JIANG ZHANG;
展开▼

申请日2006-02-28
分类号G10L19/12;
国家 US
入库时间 2022-08-21 21:01:36

相似文献

专利
外文文献
中文文献