首页> 外文会议> >Using Deep-Q Network to Select Candidates from N-best Speech Recognition Hypotheses for Enhancing Dialogue State Tracking

【24h】

Using Deep-Q Network to Select Candidates from N-best Speech Recognition Hypotheses for Enhancing Dialogue State Tracking

机译：使用Deep-Q网络从N个最佳语音识别假设中选择候选对象，以增强对话状态跟踪

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most state-of-the-art dialogue state tracking (DST) methods infer the dialogue state based on ground-truth transcriptions of utterances. In real-world situations, utterances are transcribed by automatic speech recognition (ASR) systems, which output the n-best candidate transcriptions (hypotheses). In certain noisy environments, the best transcription is often imperfect, severely influencing DST accuracy and possibly causing the dialogue system to stall or loop. The missed or misrecognized words can often be found in the runner-up candidate transcriptions from 2 to n, which could be used to improve accuracy of DST. However, looking beyond the top-ranked ASR results poses a dilemma: going too far may introduce noise, while not going far enough may not uncover any useful information. In this paper, we propose a novel approach to automatically determine the optimal time to stop reexamining runner-up ASR transcriptions based on deep reinforcement learning. Our method outperforms the baseline system, which uses only the top-1 ASR result, by 3.1%. Then, we select the dialogue rounds with the top-10 largest word error rate (WER), our method can improve DST accuracy by 15.4%, which is five times the overall improvement rate (3.1%). This improvement was expected because our proposed method is able to select informative ASR results at any rank.

机译：大多数最新的对话状态跟踪（DST）方法都是基于话语的真实转录来推断对话状态的。在现实世界中，语音会通过自动语音识别（ASR）系统进行转录，该系统会输出n个最佳候选转录（假设）。在某些嘈杂的环境中，最佳转录通常是不完善的，严重影响了DST的准确性，并可能导致对话系统停止或循环。遗漏或错误识别的单词通常可以在从第二到第二的候选候选转录中找到，这可以用来提高DST的准确性。但是，超越排名最高的ASR结果会带来一个难题：走得太远可能会引入噪音，而走得太远则可能无法发现任何有用的信息。在本文中，我们提出了一种基于深度强化学习的自动确定停止重新审查亚军ASR转录的最佳时间的新颖方法。我们的方法比仅使用前1个ASR结果的基准系统好3.1％。然后，我们选择前10个最大单词错误率（WER）的对话回合，我们的方法可以将DST准确性提高15.4％，是总体改进率（3.1％）的五倍。预期会有这种改进，因为我们提出的方法能够选择任何级别的信息丰富的ASR结果。

著录项

来源
《》|2019年|7375-7379|共5页
会议地点
作者
Richard Tzong-Han Tsai; Chia-Hao Chen; Chun-Kai Wu; Yu-Cheng Hsiao; Hung-yi Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
interactive systems; learning (artificial intelligence); speech recognition;

机译：交互式系统;学习（人工智能）;语音识别;
入库时间 2022-08-26 14:46:00

相似文献

外文文献
中文文献
专利

1. An N-best candidates-based discriminative training for speech recognition applications [J] . Jung-Kuei Chen, Soong F.K. IEEE Transactions on Speech and Audio Proceeding . 1994,第1期

机译：针对语音识别应用的基于N最佳候选人的判别训练
2. A fast method for finding the exact N-best hypotheses for multitarget tracking [J] . Danchick R., Newnam G.E. IEEE Transactions on Aerospace and Electronic Systems . 1993,第2期

机译：快速找到用于多目标跟踪的精确N最佳假设的方法
3. Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition [J] . Stephane Huet, Guillaume Gravier, Pascale Sebillot Computer speech and language . 2010,第4期

机译：N最佳列表的词法语法后处理，可改善法语自动语音识别
4. Using Deep-Q Network to Select Candidates from N-best Speech Recognition Hypotheses for Enhancing Dialogue State Tracking [C] . Richard Tzong-Han Tsai, Chia-Hao Chen, Chun-Kai Wu, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：使用Deep-Q网络从n最佳语音识别假设中选择候选者，以增强对话状态跟踪
5. Explicit N-best formant features for segment-based speech recognition. [D] . Schmid, Philipp Heinz. 1996

机译：基于段的语音识别的显式N最佳共振峰特征。
6. Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a Cocktail Party [O] . Elana Zion Golumbic, Gregory B. Cogan, Charles E. Schroeder, 2013

机译：视觉输入增强了鸡尾酒会中听觉皮层的选择性语音包络跟踪
7. Improving state-of-theart continuous speech recognition systems using the N-best paradigm with neural networks [O] . S. Austin, G. Zavaliagkos T, J. Makhoul, 1992

机译：使用具有神经网络的N最佳范例改进状态连续语音识别系统
8. Improving State-of-the-Art Continuous Speech Recognition System Using the N-Best Paradigm with Neural Networks. [R] . Austin, S., Zavaliagkost, G., Makhoul, J., 1992

机译：利用神经网络的N-Best范式改进最先进的连续语音识别系统。

Using Deep-Q Network to Select Candidates from N-best Speech Recognition Hypotheses for Enhancing Dialogue State Tracking

摘要

著录项

相似文献

相关主题

期刊订阅