首页> 外文期刊>IEEE transactions on audio, speech and language processing >Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations
【24h】

Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

机译:基于多尺度时空调制的非语音语音识别

获取原文
获取原文并翻译 | 示例

摘要

We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.
机译:我们描述了一种基于内容的音频分类算法,该算法基于听觉皮层处理模型的启发,基于新颖的多尺度光谱-时间调制特征。探索的任务是区分语音与非语音,包括动物发声,音乐和环境声音。尽管这对人类来说是一项相对容易的任务,但要使其良好地自动化仍然很困难,尤其是在嘈杂和混响的环境中。听觉模型捕获从耳蜗早期到皮层中央区域的基本过程。该模型生成声音的多维频谱时态表示,然后通过多线性降维技术对其进行分析,并通过支持向量机(SVM)对其进行分类。评估了系统对高附加噪声和混响信号的通用性,并与两种现有方法进行了比较(Scheirer和Slaney,2002; Kingsbury等,2002)。结果证明了听觉模型相对于其他两个系统的优势,尤其是在低信噪比(SNR)和高混响的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号