首页> 外文期刊>Circuits, systems, and signal processing >Automatic Hypernasality Detection in Cleft Palate Speech Using CNN
【24h】

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

机译:使用CNN的腭裂语音中的自动化高血量检测

获取原文
获取原文并翻译 | 示例
       

摘要

Automatic hypernasality detection in cleft palate speech can facilitate diagnosis by speech-language pathologists. This paper describes a feature-independent end-to-end algorithm that uses a convolutional neural network (CNN) to detect hypernasality in cleft palate speech. A speech spectrogram is adopted as the input. The average F1-scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. The experiments explore the influence of the spectral resolution on the hypernasality detection performance in cleft palate speech. Higher spectral resolution can highlight the vocal tract parameters of hypernasality, such as formants and spectral zeros. The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. Compared with deep neural network and shallow classifiers, CNN realizes the highest F1-score of 0.9485. Comparing various network architectures, the convolutional filter of size 1x8 achieves the highest F1-score in the hypernasality detection task. The selected filter size of 1x8 considers more frequency information and is more suitable for hypernasality detection than the filters of size 3x3, 4x4, 5x5, and 6x6. According to an analysis of hypernasality-sensitive vowels, the experimental result concludes that the vowel /i/ is the most sensitive vowel to hypernasality. Compared with state-of-the-art literature, the proposed CNN-based system realizes a better detection performance. The results of an experiment that is conducted on a heterogeneous corpus demonstrate that CNN can better handle the speech variability compared with the shallow classifiers.
机译:腭裂言论中的自动化性高兴检测可以通过语言病理学家促进诊断。本文介绍了一种独立于独立的端到端算法,它使用卷积神经网络(CNN)来检测腭裂语音中的到期性。采用语音谱图作为输入。使用儿童和数据集分别使用的数据集分别为0.9485和0.9746分别为0.9485和0.9746。实验探讨了光谱分辨率对腭裂语音中的快旱性检测性能的影响。较高的光谱分辨率可以突出显示器的发声,例如素质和光谱零。 CNN通过二维滤波操作学习高效功能,而浅分类器的特征提取性能是有限的。与深神经网络和浅分类器相比,CNN实现了0.9485的最高F1分数。比较各种网络架构,大小1x8的卷积滤波器实现了快速检测任务中的最高F1分数。选择的滤波器大小为1×8考虑更多频率信息,并且更适合于比大小3x3,4x4,5×5和6x6的滤波器更适合于过度迹象检测。根据过度敏感元音的分析,实验结果得出结论,元音/Ⅰ/是最敏感的上升性的元音。与最先进的文献相比,所提出的基于CNN的系统实现了更好的检测性能。在异构语料库上进行的实验结果表明,与浅分类器相比,CNN可以更好地处理语音变异性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号