首页> 外文期刊>Circuits, systems, and signal processing >A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data Classification
【24h】

A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data Classification

机译:框架级语音数据分类深度学习技术的比较研究

获取原文
获取原文并翻译 | 示例

摘要

This paper provides a comprehensive analysis of the effect of speaking rate on frame classification accuracy. Different speaking rates may affect the performance of the automatic speech recognition system yielding poor recognition accuracy. A model trained on a normal speaking rate is better able to recognize speech at a normal pace but fails to achieve similar performance when tested on slow or fast speaking rates. Our recent study has shown that a drop of almost ten percentage points in the classification accuracy is observed when a deep feed-forward network is trained on the normal speaking rate and evaluated on slow and fast speaking rates. In this paper, we extend our work to convolutional neural networks (CNN) to see if this model can reduce the accuracy gap between different speaking rates. Filter bank energies (FBE) and Mel frequency cepstral coefficients are evaluated on multiple configurations of the CNN where the networks are trained on normal speaking rate and evaluated on slow and fast speaking rates. The results are compared to those obtained by a deep neural network. A breakdown of phoneme-level classification results and the confusion between vowels and consonants is also presented. The experiments show that the CNN architecture when used with FBE features performs better on both slow and fast speaking rates. An improvement of nearly 2% in case of fast and 3% in case of slow speaking rates is observed.
机译:本文综合分析了对框架分类准确性的说法效果。不同的口语速率可能会影响自动语音识别系统的性能,从而产生差的识别精度。在正常的说话率上培训的模型更好地能够以正常的速度识别言语,但在缓慢或快速说话的速率下测试时未能实现类似的性能。我们最近的一项研究表明,当深度前馈网络在正常的说话率上进行培训并在缓慢和快速的说话速率下评​​估时,观察到分类准确度下降的几乎十个百分点。在本文中,我们将我们的工作扩展到卷积神经网络(CNN),看看该模型是否可以降低不同说话速率之间的精度差距。滤波器组合(FBE)和MEL频率谱系数在CNN的多种配置上进行评估,其中网络在正常说话率上培训,并在缓慢和快速的说话速率下进行评估。将结果与深神经网络获得的结果进行比较。还提出了音素级分类结果的崩溃以及元音和辅音之间的混乱。实验表明,当与FBE功能一起使用时,CNN架构在缓慢和快速的讲台上表现更好。在观察到慢速速度的情况下,在快速和3%的情况下提高近2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号