首页> 外文会议>NAFOSTED Conference on Information and Computer Science >Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM
【24h】

Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM

机译:使用序列长度标准化MFCC功能和深度双向LSTM改善语音识别

获取原文

摘要

Phonetic recognition is one of the most challenging problems in the field of speech analysis. These applications can be mentioned such as dialect identification [1], mispronunciation detection [2], spoken document retrieval [3], and so on. There are different approaches to solve these problems such as improving the feature selection on input speech [4], applying deep learning technique [5] [6] [7] or combining both of them [8]. With the sequence data as the phonetics, the architecture which is based on recurrent neural network (RNN) is an appropriate approach [9]. It is even more powerful when combined with the improvement of features selection on input data. In our approach, we combine the Mel Frequency Cepstral Coefficients (MFCC) method with sequence-length to present the acoustic features of speech and use some RNN models to phonetic classification. Our experiments are implemented on the Texas Instruments Massachusetts Institute of Technology (TIMIT) [10] phone recognition dataset. Especially, our data processing and features selection method give consistently better results than other researches using the same neural network model. Currently, we have achieved the lowest error test rate (13.05%) by using Bidirectional LSTM, which is the best result in TIMIT dataset with the reduction of about 3.5% over the last best result [5] [6].
机译:语音识别是语音分析领域最具挑战性的问题之一。可以提及这些应用程序,例如方言识别[1],发音错误[2],语音文档检索[3]等。解决这些问题的方法有很多,例如改进输入语音的特征选择[4],应用深度学习技术[5] [6] [7]或将两者结合[8]。以序列数据作为语音,基于递归神经网络(RNN)的体系结构是一种合适的方法[9]。当结合改进输入数据的特征选择时,它甚至更强大。在我们的方法中,我们将梅尔频率倒谱系数(MFCC)方法与序列长度相结合,以呈现语音的声学特征,并使用一些RNN模型进行语音分类。我们的实验是在德州仪器麻省理工学院(TIMIT)[10]电话识别数据集上实现的。特别是,与使用相同神经网络模型的其他研究相比,我们的数据处理和特征选择方法始终提供更好的结果。目前,我们通过使用双向LSTM达到了最低的错误测试率(13.05%),这是TIMIT数据集中的最佳结果,比最后的最佳结果减少了约3.5%[5] [6]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号