Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

Cesar Montenegro; Roberto Santana; Jose A. Lozano

首页> 外文期刊>Engineering Applications of Artificial Intelligence >Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

【24h】

Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

机译：转向末端检测任务对自动语音识别过程产生的错误的敏感性分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.

机译：转向末端检测模块（EOTD-M）是自动口头对话系统的基本组成部分。正确检测用户的话语是否已经结束或未提高解释消息含义的准确性并降低答案中的延迟的能力。通常，在对话系统中，EOTD-M与自动语音识别模块（ASR-M）耦合，以向自然语言理解单元发送完整的话语。 ASR-M转录中的错误可能对ETD-M的性能产生强烈影响。这种效果的实际范围取决于ASR-M转录误差的特定组合和作为EOTD-M的一部分实现的句子卵形特征技术。在本文中，我们根据语义信息和扬声器的特定特征来调查ETD-M的这一重要关系（语音配置文件）。我们介绍了一种自动语音识别模拟器（ASR-SIM），可以在ASR-M转录中模拟不同类型的语义错误以及不同的语音配置文件。我们使用模拟器评估具有不同特色技术的EOTD培训的长短期内存网络分类器的ASR-M错误的敏感性。我们的实验揭示了模型性能的不同方式受ASR-M错误的影响。我们不仅是ASR-SIM，它不仅是估计ETAD-M在定制嘈杂场景中的性能的有用，而且还可用于生成具有预期误差率的实际工作条件的训练数据集，这导致更好的性能。

著录项

来源
《Engineering Applications of Artificial Intelligence》 |2021年第4期|104189.1-104189.12|共12页
作者
Cesar Montenegro; Roberto Santana; Jose A. Lozano;
展开▼
作者单位

Intelligent Systems Group Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU Paseo Manuel de Lardizabal 1 20018 Donostia-San Sebastian Spain;

Intelligent Systems Group Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU Paseo Manuel de Lardizabal 1 20018 Donostia-San Sebastian Spain;

Intelligent Systems Group Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU Paseo Manuel de Lardizabal 1 20018 Donostia-San Sebastian Spain Basque Center for Applied Mathematics (BCAM) Bilbao Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Spoken dialogue systems; Automatic speech recognition; End of turn detection; Natural language processing; Neural networks;

机译：口头对话系统;自动语音识别;转弯末端检测;自然语言处理;神经网络;

相似文献

外文文献
中文文献
专利

1. Re: "frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology". [J] . Janower ML Journal of the American College of Radiology: JACR . 2009,第7期

机译：回复：“使用自动语音识别技术生成的最终放射学报告中的错误频率和频谱”。
2. Re: "frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology". [J] . Branstetter BF 4th, Shrestha RB Journal of the American College of Radiology: JACR . 2009,第7期

机译：回复：“使用自动语音识别技术生成的最终放射学报告中的错误频率和频谱”。
3. Frequency and Spectrum of Errors in Final Radiology Reports Generated With Automatic Speech Recognition Technology [J] . Leslie E. Quint, Douglas J. Quint, James D. Myles Journal of the American College of Radiology: JACR . 2008,第12期

机译：自动语音识别技术生成的最终放射学报告中的错误频率和频谱
4. Automatic speech recognition errors detection using supervised learning techniques [C] . Rahhal Errattahi, Asmaa El Hannani, Hassan Ouahmane, IEEE/ACS International Conference on Computer Systems and Applications . 2016

机译：使用监督学习技术的自动语音识别错误检测
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech [O] . László Tóth, Ildikó Hoffmann, Gábor Gosztolya, -1

机译：基于语音识别的自发性语音自动检测轻度认知障碍的解决方案
7. Automatic Speech Recognition Errors Detection Using Supervised Learning Techniques [O] . Errattahi R., El Hannani A., Ouahmane H., 2016

机译：利用监督学习技术检测自动语音识别错误

Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

摘要

著录项

相似文献

相关主题

期刊订阅