首页> 外文学位 >Integrating computational auditory scene analysis and automatic speech recognition.
【24h】

Integrating computational auditory scene analysis and automatic speech recognition.

机译:集成了计算听觉场景分析和自动语音识别功能。

获取原文
获取原文并翻译 | 示例

摘要

We present a schema-based model for phonemic restoration. The model employs missing-data ASR to decode speech based on unmasked portions and activates word templates that contain the masked phoneme via dynamic time warping. An activated template is then used to restore the masked phoneme. A systematic evaluation shows that the model is able to restore both voiced and unvoiced phonemes with a spectral quality close to that of original phonemes.; Missing-data ASR relies on a binary mask generated by bottom-up CASA to label the speech-dominant time-frequency (T-F) regions of a noisy mixture as reliable and the rest as unreliable. However, errors in mask estimation cause degradation in recognition accuracy. Hence, we propose a two-pass ASR system that performs segregation and recognition in tandem. In the first pass, an n-best lattice, consistent with bottom-up speech separation, is generated. The lattice is then re-scored using a model-based hypothesis test to improve mask estimation and recognition accuracy concurrently. This two-pass system leads to significant improvement in recognition performance.; By combining a monaural CASA system with missing-data ASR, we present a model that simulates listeners' ability to attend to a target speaker when degraded by the effects of energetic and informational masking in multitalker environments. Energetic masking refers to the phenomenon that a stronger signal masks a weaker one within a critical band. Informational masking occurs when the listener is unable to segregate target from interference. Missing-data ASR is used to account for energetic masking. The effects of informational masking are modeled by the output degradation of the CASA system in binary mask estimation. The model successfully simulates several quantitative aspects of listener performance including the differential effects of energetic and informational masking on multitalker perception.; While missing-data ASR performs well on small vocabulary tasks, previous studies have not examined the effect of vocabulary size. In this dissertation, we investigate the performance of the missing-data ASR on a larger vocabulary task and compare its results to those of conventional ASR. For conventional ASR, we extract the speech signal from a noisy mixture by estimating a Wiener filter based on estimated interaural time and intensity differences within a T-F unit. For missing-data ASR, the same estimation is used to produce a binary T-F mask. We find that while missing-data recognition outperforms conventional ASR on a small vocabulary task, the performance of conventional ASR is significantly better when the vocabulary size is increased. (Abstract shortened by UMI.)
机译:我们提出了基于模式的音素还原模型。该模型采用缺失数据ASR根据未掩盖部分解码语音,并通过动态时间扭曲激活包含掩盖音素的单词模板。然后,使用激活的模板来还原被屏蔽的音素。一项系统的评估表明,该模型能够以接近原始音素的频谱质量恢复浊音和清音音素。数据丢失ASR依靠自下而上的CASA生成的二进制掩码将嘈杂的混合物的语音占主导地位的时频(T-F)区域标记为可靠,而其余部分则标记为不可靠。然而,掩模估计中的误差导致识别精度下降。因此,我们提出了一种两遍式ASR系统,该系统可以串联执行分离和识别。在第一遍中,生成与自下而上的语音分离相一致的n最佳晶格。然后使用基于模型的假设检验对晶格重新评分,以同时提高掩模估计和识别精度。这种两遍系统极大地提高了识别性能。通过将单声道CASA系统与缺少数据的ASR结合,我们提出了一个模型,该模型可以模拟听众在多方通话者环境中由于精力充沛和信息掩蔽的影响而退化时,听其讲话的能力。高能掩蔽指的是在临界频段内较强的信号掩盖较弱的信号的现象。当侦听器无法将目标与干扰隔离时,就会发生信息屏蔽。数据丢失ASR用于解决能量屏蔽问题。信息掩蔽的效果通过二进制掩膜估计中CASA系统的输出降级来建模。该模型成功地模拟了听者表现的几个定量方面,包括能量和信息掩蔽对多听者感知的不同影响。尽管缺少数据的ASR在较小的词汇量任务上表现良好,但以前的研究并未检查词汇量的影响。在本文中,我们研究了缺失数据ASR在较大词汇量任务上的表现,并将其结果与常规ASR进行了比较。对于传统的ASR,我们通过根据T-F单元中的听觉时间和强度差异估算Wiener滤波器,从嘈杂的混合物中提取语音信号。对于丢失数据的ASR,使用相同的估计来生成二进制T-F掩码。我们发现,虽然在小词汇量任务上丢失数据的识别性能优于常规ASR,但是当词汇量增加时,常规ASR的性能明显更好。 (摘要由UMI缩短。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号