...
首页> 外文期刊>IEEE transactions on audio, speech and language processing >Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics
【24h】

Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics

机译:使用多个主题的分类置信度进行域外话语检测

获取原文
获取原文并翻译 | 示例

摘要

One significant problem for spoken language systems is how to cope with users' out-of-domain (OOD) utterances which cannot be handled by the back-end application system. In this paper, we propose a novel OOD detection framework, which makes use of the classification confidence scores of multiple topics and applies a linear discriminant model to perform in-domain verification. The verification model is trained using a combination of deleted interpolation of the in-domain data and minimum-classification-error training, and does not require actual OOD data during the training process, thus realizing high portability. When applied to the "phrasebook" system, a single utterance read-style speech task, the proposed approach achieves an absolute reduction in OOD detection errors of up to 8.1 points (40% relative) compared to a baseline method based on the maximum topic classification score. Furthermore, the proposed approach realizes comparable performance to an equivalent system trained on both in-domain and OOD data, while requiring no OOD data during training. We also apply this framework to the "machine-aided-dialogue" corpus, a spontaneous dialogue speech task, and extend the framework in two manners. First, we introduce topic clustering which enables reliable topic confidence scores to be generated even for indistinct utterances, and second, we implement methods to effectively incorporate dialogue context. Integration of these two methods into the proposed framework significantly improves OOD detection performance, achieving a further reduction in equal error rate (EER) of 7.9 points
机译:口语系统的一个重要问题是如何应对用户无法通过后端应用程序系统处理的域外(OOD)语音。在本文中,我们提出了一种新颖的OOD检测框架,该框架利用多个主题的分类置信度得分,并应用线性判别模型进行域内验证。使用删除的域内数据插值和最小分类错误训练的组合来训练验证模型,并且在训练过程中不需要实际的OOD数据,从而实现了高度的可移植性。与基于最大主题分类的基线方法相比,当将其应用于“短语手册”系统(一种单语音朗读式语音任务)时,与基于基线的方法相比,所提出的方法可将OOD检测错误绝对减少多达8.1点(相对于40%)得分了。此外,所提出的方法实现了与在域内和OOD数据上训练的等效系统相当的性能,同时在训练过程中不需要OOD数据。我们还将此框架应用于自发对话语音任务“机器辅助对话”语料库,并以两种方式扩展了该框架。首先,我们引入主题聚类,即使对于含糊不清的话语,也可以生成可靠的主题置信度分数;其次,我们实现了有效合并对话上下文的方法。将这两种方法集成到建议的框架中可显着提高OOD检测性能,从而使等错误率(EER)进一步降低7.9点

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号