首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy
【24h】

Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy

机译:沉默语音识别作为喉切除患者的替代性交流设备

获取原文
获取原文并翻译 | 示例

摘要

Each year thousands of individuals require surgical removal of the larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, eight individuals at least 6 months after total laryngectomy were recorded using eight sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full eight-sensor set (averaging 9.5% for the top four participants), and 13.6% when reducing the sensor set to four locations per individual (n = 7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.
机译:每年,成千上万的人由于外伤或疾病而需要手术切除喉部(语音箱),因此需要替代的语音源或辅助装置进行口头交流。尽管喉切除术后自然声音会丢失,但大多数控制语音清晰度的肌肉仍保持完整。可以从脖子和脸部记录语音肌肉组织的表面肌电图(sEMG)活动,并将其用于自动语音识别,以提供语音转文本或合成语音作为替代的交流方式。即使以无声(亚声)的方式来讲话或讲话,也是如此,这使其成为喉切除术后的适当交流平台。在这项研究中,在完全喉切除术后至少六个月的时间里,使用八个sEMG传感器在他们的脸部(4)和脖子(4)上记录了8个人,同时阅读了由2500字的词汇构成的短语。一组独特的短语用于为英语的39种常用音素中的每一个训练基于音素的识别模型,其余短语用于基于从运行语音中识别出的音素来测试模型的单词识别。完整的8个传感器组的单词错误率平均为10.3%(前四个参与者平均为9.5%),而将传感器组减少到每个人的四个位置(n = 7)时,字错误率平均为13.6%。这项研究为基于sEMG的语音识别提供了令人信服的概念证明,具有进一步提高识别性能的强大潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号