首页> 外文期刊>Circuits, systems, and signal processing >Automatic Hypernasality Detection in Cleft Palate Speech Using CNN
【24h】

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

机译:使用CNN自动检测Pal裂语音中的鼻音

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic hypernasality detection in cleft palate speech can facilitate diagnosis by speech-language pathologists. This paper describes a feature-independent end-to-end algorithm that uses a convolutional neural network (CNN) to detect hypernasality in cleft palate speech. A speech spectrogram is adopted as the input. The average F1-scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. The experiments explore the influence of the spectral resolution on the hypernasality detection performance in cleft palate speech. Higher spectral resolution can highlight the vocal tract parameters of hypernasality, such as formants and spectral zeros. The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. Compared with deep neural network and shallow classifiers, CNN realizes the highest F1-score of 0.9485. Comparing various network architectures, the convolutional filter of size 1x8 achieves the highest F1-score in the hypernasality detection task. The selected filter size of 1x8 considers more frequency information and is more suitable for hypernasality detection than the filters of size 3x3, 4x4, 5x5, and 6x6. According to an analysis of hypernasality-sensitive vowels, the experimental result concludes that the vowel /i/ is the most sensitive vowel to hypernasality. Compared with state-of-the-art literature, the proposed CNN-based system realizes a better detection performance. The results of an experiment that is conducted on a heterogeneous corpus demonstrate that CNN can better handle the speech variability compared with the shallow classifiers.
机译:c裂语音中的自动鼻音检测可以帮助语音病理学家进行诊断。本文介绍了一种与特征无关的端到端算法,该算法使用卷积神经网络(CNN)检测c裂语音中的鼻音。语音频谱图被用作输入。使用儿童所说的数据集和成年人所说的数据集,鼻感检测任务的平均F1分数分别为0.9485和0.9746。实验探索了频谱分辨率对c裂语音中鼻音检测性能的影响。较高的频谱分辨率可以突出鼻音的声道参数,例如共振峰和频谱零点。 CNN通过二维滤波操作学习有效的特征,而浅分类器的特征提取性能受到限制。与深度神经网络和浅层分类器相比,CNN的最高F1得分为0.9485。比较各种网络体系结构,大小为1x8的卷积滤波器在鼻感检测任务中实现了最高的F1分数。与3x3、4x4、5x5和6x6大小的过滤器相比,所选的1x8大小的过滤器考虑了更多的频率信息,并且更适合于鼻腔疾病检测。根据对鼻音敏感的元音的分析,实验结果得出结论,元音/ i /是对鼻音最敏感的元音。与最新文献相比,该基于CNN的系统实现了更好的检测性能。对异构语料库进行的实验结果表明,与浅分类器相比,CNN可以更好地处理语音变异性。

著录项

  • 来源
    《Circuits, systems, and signal processing》 |2019年第8期|3521-3547|共27页
  • 作者单位

    Sichuan Univ, Coll Elect Engn & Informat Technol, Chengdu, Sichuan, Peoples R China;

    Sichuan Univ, Coll Elect Engn & Informat Technol, Chengdu, Sichuan, Peoples R China;

    Sichuan Univ, Coll Elect Engn & Informat Technol, Chengdu, Sichuan, Peoples R China;

    Sichuan Univ, Hosp Stomatol, Chengdu, Sichuan, Peoples R China;

    Sichuan Univ, Coll Elect Engn & Informat Technol, Chengdu, Sichuan, Peoples R China;

    Sichuan Univ, Coll Elect Engn & Informat Technol, Chengdu, Sichuan, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Cleft palate speech; Hypernasality; Convolutional neural network; End-to-end; Speech spectrogram;

    机译:腭裂言论;上衣;卷积神经网络;端到端;语音谱图;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号