首页> 外文会议>International Conference on Spoken Language >The influence of bigram constraints on word recognition by humans: implications for computer speech recognition
【24h】

The influence of bigram constraints on word recognition by humans: implications for computer speech recognition

机译:Bigram限制对人类词识别的影响:计算机语音识别的影响

获取原文

摘要

The gap between human and machine performance on speech recognition tasks is still very large. Recognition of words in telephone conversations is slightly better than 50%, based on results reported on the Switchboard corpus by leading researchers using state of the art HMM systems. We know from our own experience that human perception typically delivers much more accurate word recognition over the telephone. Why is there such a large gap between machine and human performance, and what can be done to dose this gap? One way to address this question is to study the sources of linguistic information in the speech signal that are known to be important for word recognition, and measure how well machine systems utilize this information relative to humans. We measured word recognition performance of listeners presented with words from the Switchboard corpus. Stimuli consisted of actual utterances excised from the Switchboard corpus, high quality recordings of utterances that occurred in Switchboard conversations, and recordings of word sequences with zero, medium and high bigram probabilities based on a language model computed from transcriptions of the Switchboard corpus. The results show that human listeners are very good at recognizing words in the absence of word sequence constraints, and that statistical language models fail to capture much of the high level linguistic information needed to recognize words in fluent speech. The results are discussed in terms of their implications to current approaches to acoustic and language modeling in computer speech recognition.
机译:语音识别任务的人和机器性能之间的差距仍然非常大。基于使用现有技术的HMM系统的领先研究人员,在交换机语料库上报告的结果,对电话交谈中的单词的认识略高于50%。我们从自己的经验中知道,人类感知通常会通过电话提供更准确的单词识别。为什么机器和人类性能之间存在如此巨大的差距,并且可以做什么来剂量这种差距?解决这个问题的方法之一是研究在已知对单词识别重要的语音信号的语言信息的来源,并测量以及计算机系统是如何利用这一相对于人类的信息。我们测量了用交换机语料库中的单词呈现的侦听器的字识别性能。刺激由切换器语料库中切换的实际话语,在交换机对话中发生的高质量记录,以及基于从交换机语料库的转录中计算的语言模型的语言模型的单词序列的录制。结果表明,人类听众非常擅长在没有单词序列约束的情况下识别单词,并且统计语言模型无法捕获识别流利语音中的单词所需的大部分高级语言信息。结果是在计算机语音识别中对电流和语言建模的目前的声学和语言建模方法的影响讨论的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号