首页> 外文期刊>Computer speech and language >'Early recognition' of polysyllabic words in continuous speech
【24h】

'Early recognition' of polysyllabic words in continuous speech

机译:连续语音中的多音节单词的“早期识别”

获取原文
获取原文并翻译 | 示例

摘要

Humans are able to recognise a word before its acoustic realisation is complete. This in contrast to conventional automatic speech recognition (ASR) systems, which compute the likelihood of a number of hypothesised word sequences, and identify the words that were recognised on the basis of a trace back of the hypothesis with the highest eventual score, in order to maximise efficiency and performance. In the present paper, we present an ASR system, SpeM, based on principles known from the field of human word recognition that is able to model the human capability of 'early recognition' by computing word activation scores (based on negative log likelihood scores) during the speech recognition process. Experiments on 1463 polysyllabic words in 885 utterances showed that 64.0% (936) of these polysyllabic words were recognised correctly at the end of the utterance. For 81.1% of the 936 correctly recognised polysyllabic words the local word activation allowed us to identify the word before its last phone was available, and 64.1% of those words were already identified one phone after their lexical uniqueness point. We investigated two types of predictors for deciding whether a word is considered as recognised before the end of its acoustic realisation. The first type is related to the absolute and relative values of the word activation, which trade false acceptances for false rejections. The second type of predictor is related to the number of phones of the word that have already been processed and the number of phones that remain until the end of the word. The results showed that SpeM's performance increases if the amount of acoustic evidence in support of a word increases and the risk of future mismatches decreases.
机译:人们可以在完成语音识别之前就识别出一个单词。这与传统的自动语音识别(ASR)系统形成对比,传统的自动语音识别(ASR)系统计算多个假设的单词序列的可能性,并根据具有最高最终得分的假设回溯来识别被识别的单词,顺序为最大化效率和性能。在本文中,我们基于人类单词识别领域中已知的原理,提出了一种ASR系统SpeM,该系统能够通过计算单词激活得分(基于负对数似然得分)来模拟人类的“早期识别”能力在语音识别过程中。在885个语音中对1463个多音节单词进行的实验表明,在语音结束时正确识别了64.0%(936)个多音节单词。在936个正确识别的多音节单词中,有81.1%的本地单词激活功能使我们能够在最后一个电话可用之前识别该单词,并且这些单词中有64.1%在其词汇唯一点之后已经被一个电话识别。我们研究了两种类型的预测变量,以确定一个单词在其声音实现之前是否被认为是可以识别的。第一种类型与激活字的绝对值和相对值有关,该值将错误接受与错误拒绝联系起来。第二种类型的预测变量与已处理过的单词的电话数量以及直到单词末尾的剩余电话数量有关。结果表明,如果支持单词的声音证据数量增加并且将来出现不匹配的风险降低,则SpeM的性能会提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号