首页> 外文期刊>Computer speech and language >Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children
【24h】

Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children

机译:利用语言背景在二元相互作用中,提高儿童自动演讲识别

获取原文
获取原文并翻译 | 示例

摘要

Automatic speech recognition for child speech has been long considered a more challenging problem than for adult speech. Various contributing factors have been identified such as larger acoustic speech variability including mispronunciations due to continuing biological changes in growth, developing vocabulary and linguistic skills, and scarcity of training corpora. A further challenge arises when dealing with spontaneous speech of children involved in a conversational interaction, and especially when the child may have limited or impaired communication ability. This includes health applications, one of the motivating domains of this paper, that involve goal-oriented dyadic interactions between a child and clinician/adult social partner as a part of behavioral assessment. In this work, we use linguistic context information from the interaction to adapt speech recognition models for children speech. Specifically, spoken language from the interacting adult speech provides the context for the child's speech. We propose two methods to exploit this context: lexical repetitions and semantic response generation. For the latter, we make use of sequence-to-sequence models that learn to predict the target child utterance given context adult utterances. Long-term context is incorporated in the model by propagating the cell-state across the duration of conversation. We use interpolation techniques to adapt language models at the utterance level, and analyze the effect of length and direction of context (forward and backward). Two different domains are used in our experiments to demonstrate the generalized nature of our methods - interactions between a child with ASD and an adult social partner in a play-based, naturalistic setting, and in forensic interviews between a child and a trained interviewer. In both cases, context-adapted models yield significant improvement (upto 10.71% in absolute word error rate) over the baseline and perform consistently across context windows and directions. Using statistical analysis, we investigate the effect of source-based (adult) and target-based (child) factors on adaptation methods. Our results demonstrate the applicability of our modeling approach in improving child speech recognition by employing information transfer from the adult interlocutor.
机译:儿童演讲的自动语音识别已经很长时间被认为是一个比成人言论更具挑战性的问题。已经确定了各种贡献因素,例如较大的声学语音变异性,包括由于持续的生物学变化,发展词汇和语言技能以及培训语料库的稀缺而导致的误用。在处理参与会话互动的自发言论时出现了进一步的挑战,特别是当孩子可能有有限或有受损的沟通能力时。这包括健康应用,这篇论文的动机领域之一,涉及儿童和临床医生/成人社会伙伴之间的目标导向的二元相互作用作为行为评估的一部分。在这项工作中,我们使用来自交互的语言背景信息来调整儿童语音识别模型。具体而言,互动成人演讲中的口语为孩子的演讲提供了背景。我们提出了两种方法来利用这种背景:词汇重复和语义响应生成。对于后者,我们利用序列到序列模型,该模型学会预测给予上下文成人话语的目标儿童话语。通过在对话持续时间内传播细胞状态来在模型中并入长期上下文。我们使用插值技术在话语水平处适应语言模型,并分析上下文的长度和方向的效果(向前和向后)。我们的实验中使用了两个不同的域来证明我们方法的广义性质 - 在基于戏剧的,自然主义环境中的ASD和成人社会伴侣的儿童之间的相互作用,以及儿童与训练有素的面试官的法医访谈。在这两种情况下,上下文的模型在基线上产生显着的改善(绝对单词错误率为10.71%),并在上下文窗口和方向上始终如一地执行。使用统计分析,我们研究了基于源(成人)和基于目标的(儿童)因素对适应方法的影响。我们的结果展示了我们的建模方法通过从成人对话者采用信息转移来改善儿童语音识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号