首页> 外文OA文献 >A speech recognition model based on tri-phones for the Arabic language
【2h】

A speech recognition model based on tri-phones for the Arabic language

机译:基于三音机的阿拉伯语语音识别模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

One way to keep up a decent recognition of results- with increasing vocabulary- is the use of base units rather than words. This paper presents a Continuous Speech Large Vocabulary Recognition System-for Arabic, which is based on tri-phones. In order to train and test the system, a dictionary and a 39-dimensional Mel Frequency Cepstrum Coefficient (MFCC) feature vector was computed. The computations involve: Hamming Window, Fourier Transformation, Average Spectral Value (ASV), Logarithm of ASV, Normalized Energy, as well as, the first and second order time derivatives of 13-coefficients. A combination of a Hidden Markov Model and a Neural Network Approach was used in order to model the basic temporal nature of the speech signal. The results obtained by testing the recognizer system with 7841 tri-phones. 13-coefficients indicate accuracy level of 58%. 39-coeefficents indicates 62%. With Cepstrum Mean Normalization, there is an indication of 71%. With these small available data-only 620 sentences-these results are very encouraging.
机译:随着词汇量的增加,保持对结果的体面认识的一种方法是使用基本单位而不是单词。本文提出了一种基于三音素的阿拉伯语连续语音大词汇量识别系统。为了训练和测试系统,计算了字典和39维梅尔频率倒谱系数(MFCC)特征向量。计算包括:汉明窗,傅立叶变换,平均谱值(ASV),ASV的对数,归一化能量以及13系数的一阶和二阶时间导数。为了对语音信号的基本时间特性建模,使用了隐马尔可夫模型和神经网络方法的组合。通过使用7841三部电话测试识别器系统获得的结果。 13系数表示准确度为58%。 39系数表示62%。使用倒谱均值归一化时,表示为71%。有了这些小的可用数据,仅620个句子,这些结果非常令人鼓舞。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号