首页> 外文会议>International Conference on Informatics, Multimedia, Cyber and Information System >Audio Feature Extraction on SIBI Dataset for Speech Recognition
【24h】

Audio Feature Extraction on SIBI Dataset for Speech Recognition

机译:Sibi Dataset对语音识别的音频功能提取

获取原文

摘要

Mel Frequency Cepstral Coefficients has been regarded as the standard method of feature extraction for Automatic Speech Recognition (ASR) systems for the last few years. Its performance may be affected by multiple variables, such as the number of features, audio channels, filter width, or the types of filter banks used. In this paper, several comparisons were made to find the best combination of variables that provides the best results on the SIBI (Indonesian Sign Language) dataset, which consists of utterances of sentences by both Deaf and Hard of Hearing (DHH) and non-DHH people. Based on this experiment, although generally the ASR on DHH dataset is lower than those of the non-DHH dataset, the results are still relatively high, around 4.71 % WER and 10.30% SER compared to 0.15% and 0.40% in WER and SER, respectively.
机译:MEL频率谱系数被认为是过去几年自动语音识别(ASR)系统的特征提取的标准方法。其性能可能受到多个变量的影响,例如特征数,音频通道,滤波器宽度或所使用的滤波器组的类型。在本文中,进行了几种比较,以找到最佳变量组合,这些变量在SIBI(印度尼西亚语手语)数据集中提供了最佳结果,这包括聋人和听力(DHH)和非DHH的句子的话语人们。基于该实验,虽然一般而言,DHH数据集上的ASR低于非DHH数据集的ASR,但结果仍然比较高,左右4.71%,而10.30 SER分别为WER和SER的0.15%和0.40%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号