首页> 外文期刊>IEICE transactions on information and systems >In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer
【24h】

In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer

机译:使用现成的云语音识别器提高话语分类精度的车载语音接口

获取原文
           

摘要

For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.
机译:对于使用多功能云语音识别服务(cloud ASR)的启用语音的汽车导航系统,需要实现对语音识别错误具有鲁棒性的发声分类,以实现用户友好的语音界面。这项研究的目的是,当分类器的输入是从云ASR获得的易于出错的语音识别结果时,为启用语音的汽车导航系统提高发声分类的准确性。语音分类的作用是根据自发的语音预测用户希望执行的汽车导航功能。由于在汽车中行驶时会产生噪声,因此云ASR会导致语音识别错误,并且该错误会降低发声分类的准确性。有很多方法可以通过修改语音识别器的内部来减少语音识别错误的数量。但是,应用程序开发人员无法将这些方法应用于云ASR,因为他们无法自定义ASR。在本文中,我们提出了一种系统,该系统可通过修改云ASR的语音信号输入和ASR的识别句输出来改进发声分类的准确性。首先,我们的系统会根据用户的话语进行语音增强,然后将增强的和未增强的语音信号发送到云ASR。来自两个语音信号的语音识别结果被合并以减少识别错误的数量。其次,为了减少发声分类错误,我们提出了一种数据增强方法,我们称之为“最佳掺杂”,该方法不仅将准确的转录而且还会将容易出错的句子添加到训练数据中。根据对汽车导航产品的真实用户话语进行的评估表明,我们的系统将话语分类错误的数量与基准状况相比降低了54%。最后,我们为分类器提出了一种半自动升级方法,以从云ASR的改进性能中受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号