首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Improving Human-computer Interaction in Low-resource Settings with Text-to-phonetic Data Augmentation
【24h】

Improving Human-computer Interaction in Low-resource Settings with Text-to-phonetic Data Augmentation

机译:通过文本到语音的数据增强在资源不足的环境中改善人机交互

获取原文

摘要

Off-the-shelf speech recognition systems can yield useful results and accelerate application development, but general-purpose systems applied to specialized domains can introduce acoustically small-but semantically catastrophic-errors. Furthermore, sufficient audio data may not be available to develop custom acoustic models for niche tasks. To address these problems, we propose a concept to improve performance in text classification tasks that use speech transcripts as input, without any in-domain audio data. Our method augments available typewritten text training data with inferred phonetic information so that the classifier will learn semantically important acoustic regularities, making it more robust to transcription errors from the general purpose ASR. We successfully pilot our method in a speech-based virtual patient used for medical training, recovering up to 62% of errors incurred by feeding a small test set of speech transcripts to a classification model trained on typescript.
机译:现成的语音识别系统可以产生有用的结果并加速应用程序开发,但是应用于特定领域的通用系统可能会引入听觉上很小但语义上灾难性的错误。此外,可能没有足够的音频数据可用于开发利基任务的自定义声学模型。为了解决这些问题,我们提出了一个概念,以提高使用语音成绩单作为输入而没有任何域内音频数据的文本分类任务的性能。我们的方法利用推断出的语音信息扩充了可用的打字文本训练数据,从而使分类器将学习语义上重要的声学规律性,从而使其对于来自通用ASR的转录错误更加健壮。我们成功地在用于医疗培训的基于语音的虚拟患者中试用了我们的方法,通过将少量的语音成绩单测试集输入到在打字稿上训练的分类模型中,可以恢复高达62%的错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号