首页> 外文会议>Mexican international conference on artificial intelligence >Using Values of the Human Cochlea in the Macro and Micro Mechanical Model for Automatic Speech Recognition
【24h】

Using Values of the Human Cochlea in the Macro and Micro Mechanical Model for Automatic Speech Recognition

机译:在宏观和微观力学模型中使用人耳蜗的值进行自动语音识别

获取原文

摘要

Recently the parametric representation using cochlea behavior has been used in different studies related with Automatic Speech Recognition (ASR). That is because this hearing organ in mammalians is the most important element used to make a transduction of the sound pressure that is received by the outer ear. This paper shows how the macro and micro mechanical model is used in ASR tasks. The values that Neely, Elliot and Ku founded in their works, related with the macro and micro mechanical model such as Neely were used to set the central frequencies of a bank filter to obtain parameters from the speech in a similar form as MFCC (Mel Frequency Cepstrum Coefficients) has been constructed. An approach that considers a new form to distribute the bank filter in our parametric representation is proposed. Then this distribution of the bank filter to have a different representation of the speech in frequency domain compared with MFCC is applied. The response of these three values mentioned above into macro and micro mechanical model to create the central frequencies of the bank filter were used, then the Mel scale function substituted by a representation based in the cochlear response based on the Neely model. This model was used with a set of different parameters of the cochlea, used by Nelly, Elliot and Ku in their works, such as mass, damping and stiffness; among others. A performance of 98 to 100% was reached for a task that uses Spanish isolated digits pronounced by 5 different speakers. Corpus SUSAS with neutral sound records with some advantages in comparison with MFCC was applied.
机译:最近,使用耳蜗行为的参数表示已被用于与自动语音识别(ASR)相关的不同研究中。这是因为哺乳动物的听力器官是用来转换外耳所接收声压的最重要元素。本文展示了如何在ASR任务中使用宏观和微观力学模型。 Neely,Elliot和Ku在他们的工作中建立的与宏观和微观力学模型(例如Neely)相关的值被用来设置存储滤波器的中心频率,从而以类似于MFCC的形式从语音中获取参数(Mel Frequency)倒谱系数已构建。提出了一种考虑新形式以在参数表示中分布库滤波器的方法。然后应用与MFCC相比在频域中具有不同语音表示的库滤波器的这种分布。使用上述三个值在宏观和微观力学模型中的响应,以创建滤波器组的中心频率,然后用基于Neely模型的耳蜗响应中的表示形式代替Mel尺度函数。这个模型与Nelly,Elliot和Ku在他们的作品中使用的一组不同的耳蜗参数一起使用,例如质量,阻尼和刚度。其中。使用由5个不同的发音者发音的西班牙语孤立的数字来完成的任务,其性能达到98%至100%。语料库SUSAS具有中性的声音记录,与MFCC相比具有一些优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号