首页> 外文期刊>Engineering Applications of Artificial Intelligence >Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process
【24h】

Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

机译:转向末端检测任务对自动语音识别过程产生的错误的敏感性分析

获取原文
获取原文并翻译 | 示例
       

摘要

An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.
机译:转向末端检测模块(EOTD-M)是自动口头对话系统的基本组成部分。正确检测用户的话语是否已经结束或未提高解释消息含义的准确性并降低答案中的延迟的能力。通常,在对话系统中,EOTD-M与自动语音识别模块(ASR-M)耦合,以向自然语言理解单元发送完整的话语。 ASR-M转录中的错误可能对ETD-M的性能产生强烈影响。这种效果的实际范围取决于ASR-M转录误差的特定组合和作为EOTD-M的一部分实现的句子卵形特征技术。在本文中,我们根据语义信息和扬声器的特定特征来调查ETD-M的这一重要关系(语音配置文件)。我们介绍了一种自动语音识别模拟器(ASR-SIM),可以在ASR-M转录中模拟不同类型的语义错误以及不同的语音配置文件。我们使用模拟器评估具有不同特色技术的EOTD培训的长短期内存网络分类器的ASR-M错误的敏感性。我们的实验揭示了模型性能的不同方式受ASR-M错误的影响。我们不仅是ASR-SIM,它不仅是估计ETAD-M在定制嘈杂场景中的性能的有用,而且还可用于生成具有预期误差率的实际工作条件的训练数据集,这导致更好的性能。

著录项

  • 来源
    《Engineering Applications of Artificial Intelligence》 |2021年第4期|104189.1-104189.12|共12页
  • 作者单位

    Intelligent Systems Group Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU Paseo Manuel de Lardizabal 1 20018 Donostia-San Sebastian Spain;

    Intelligent Systems Group Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU Paseo Manuel de Lardizabal 1 20018 Donostia-San Sebastian Spain;

    Intelligent Systems Group Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU Paseo Manuel de Lardizabal 1 20018 Donostia-San Sebastian Spain Basque Center for Applied Mathematics (BCAM) Bilbao Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Spoken dialogue systems; Automatic speech recognition; End of turn detection; Natural language processing; Neural networks;

    机译:口头对话系统;自动语音识别;转弯末端检测;自然语言处理;神经网络;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号