首页> 外文OA文献 >Discrimination of Speech From Non-Speech Based on Multiscale Spectro-Temporal Modulations
【2h】

Discrimination of Speech From Non-Speech Based on Multiscale Spectro-Temporal Modulations

机译:基于多尺度时空调制的非语音语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We describe a content-based audio classification algorithm based on novel multiscale spectrotemporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from non-speech consisting of animal vocalizations, music and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multi-linear dimensionality reduction technique and classified by a Support Vector Machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches [1] [2]. The results demonstrate the advantages of the auditory model over the other two systems, especially at low SNRs and high reverberation.
机译:我们描述了一种基于听觉皮层处理模型启发的新型多尺度光谱时间调制特征的基于内容的音频分类算法。探索的任务是区分语音与非语音,包括动物发声,音乐和环境声音。尽管这对人类来说是一项相对容易的任务,但要使其良好地自动化仍然很困难,尤其是在嘈杂和混响的环境中。听觉模型捕获从耳蜗早期到皮层中央区域的基本过程。该模型生成声音的多维频谱时态表示,然后通过多维降维技术对其进行分析,并通过支持向量机(SVM)对其进行分类。评估了系统对高附加噪声和混响信号的一般化,并与两种现有方法进行了比较[1] [2]。结果证明了听觉模型相对于其他两个系统的优势,尤其是在低SNR和高混响的情况下。

著录项

  • 作者

    Mesgarani Nima;

  • 作者单位
  • 年度 2005
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号