首页> 外文会议>International Conference on Electrical Engineering and Information Communication Technology >Performance comparison of MFCC based bangla ASR system in presence and absence of third differential coefficients
【24h】

Performance comparison of MFCC based bangla ASR system in presence and absence of third differential coefficients

机译:有无三阶微分系数的基于MFCC的孟加拉ASR系统的性能比较

获取原文

摘要

Present Mel Frequency Cepstral Coefficient (MFCC) based Bangla Automatic Speech Recognition (ASR) systems are mostly implemented with delta and acceleration coefficients. With delta and acceleration coefficients of MFCC and the log energy, a vector set of 39 dimensions is obtained per 10ms. In this paper, our objective is to observe the effect of third differential coefficients on the performance of Bangla ASR, which is not explored in this field yet. In doing so, we have appended 13 third differential coefficients along with previous 39 coefficients to make a vector set of 52 coefficients per 10ms frame. We have observed the performance of Bangla ASR system in the presence and absence of third differential coefficients using Hidden Markov Model (HMM) based tied-state triphone model. To make the speech corpus, 100 sentences have been uttered by a different number of speakers at different phases including both male and female of similar ages in between 22–24. Hidden-Markov-Model Toolkit (HTK) has been used here for the comparative analysis. We have considered the Sentence Correction Rate (SCR) as the performance indicator. From the experiments, it has been observed that the MFCC based system of 39 (MFCC39) and 52 (MFCC52) dimensions have average SCR of 98.89% and 98.94% respectively. Therefore, our finding is that slight improvement is possible with the inclusion of third differential coefficients when the sampling data rate is as high as 44.1 KHz.
机译:当前的基于梅尔频率倒谱系数(MFCC)的孟加拉语自动语音识别(ASR)系统主要采用增量系数和加速度系数来实现。利用MFCC的增量系数和加速度系数以及对数能量,每10ms可获得39个维的向量集。在本文中,我们的目的是观察三次微分系数对Bangla ASR性能的影响,这一领域尚未对此进行探讨。为此,我们将13个第三微分系数与之前的39个系数一起添加,以构成每10ms帧52个系数的向量集。我们已经观察到了使用基于隐马尔可夫模型(HMM)的束缚三音器模型的Bangla ASR系统在存在和不存在三阶微分系数的情况下的性能。为了使语音语料库,在不同阶段,包括22岁至24岁之间年龄相近的男性和女性,在不同阶段讲了100句话。隐马尔可夫模型工具包(HTK)在这里用于比较分析。我们已经将句子校正率(SCR)视为性能指标。从实验中,已经观察到,尺寸为39(MFCC39)和52(MFCC52)的基于MFCC的系统的平均SCR分别为98.89%和98.94%。因此,我们的发现是,当采样数据速率高达44.1 KHz时,如果包含第三差分系数,则可能会有轻微的改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号