首页> 外文期刊>Circuits, systems, and signal processing >A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data Classification
【24h】

A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data Classification

机译:深度学习技术在帧级语音数据分类中的比较研究

获取原文
获取原文并翻译 | 示例

摘要

This paper provides a comprehensive analysis of the effect of speaking rate on frame classification accuracy. Different speaking rates may affect the performance of the automatic speech recognition system yielding poor recognition accuracy. A model trained on a normal speaking rate is better able to recognize speech at a normal pace but fails to achieve similar performance when tested on slow or fast speaking rates. Our recent study has shown that a drop of almost ten percentage points in the classification accuracy is observed when a deep feed-forward network is trained on the normal speaking rate and evaluated on slow and fast speaking rates. In this paper, we extend our work to convolutional neural networks (CNN) to see if this model can reduce the accuracy gap between different speaking rates. Filter bank energies (FBE) and Mel frequency cepstral coefficients are evaluated on multiple configurations of the CNN where the networks are trained on normal speaking rate and evaluated on slow and fast speaking rates. The results are compared to those obtained by a deep neural network. A breakdown of phoneme-level classification results and the confusion between vowels and consonants is also presented. The experiments show that the CNN architecture when used with FBE features performs better on both slow and fast speaking rates. An improvement of nearly 2% in case of fast and 3% in case of slow speaking rates is observed.
机译:本文全面分析了语速对帧分类准确性的影响。不同的语速可能会影响自动语音识别系统的性能,从而导致识别精度下降。以正常语速训练的模型能够以正常速度更好地识别语音,但在低语速或快语速下进行测试时却无法达到类似的性能。我们最近的研究表明,当对深层前馈网络以正常语速训练并以慢语速和快语速进行评估时,分类准确度下降了近十个百分点。在本文中,我们将工作扩展到卷积神经网络(CNN),以查看该模型是否可以缩小不同语速之间的准确度差距。在CNN的多种配置下评估滤波器组能量(FBE)和梅尔频率倒谱系数,在该配置中,网络以正常讲话速率进行训练,并以慢速和快速讲话速率进行评估。将结果与通过深度神经网络获得的结果进行比较。还介绍了音素级别分类结果的细目分类以及元音和辅音之间的混淆。实验表明,当CNN架构与FBE功能配合使用时,无论在慢速通话还是快速通话时,其性能都更好。观察到快的情况下提高了将近2%,慢速的情况下提高了3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号