【24h】

End-to-end ASR to jointly predict transcriptions and linguistic annotations

机译:结束到底ASR共同预测转录和语言注释

获取原文

摘要

We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags. Since linguistic information is important in natural language processing (NLP), the proposed ASR is especially useful for speech interface applications, including spoken dialogue systems and speech translation, which combine ASR and NLP. To produce linguistic annotations, we train the ASR system using modified training targets: each grapheme or multi-grapheme unit in the target transcript is followed by an aligned phoneme sequence and/or POS tag. Since our method has access to the underlying audio data, we can estimate linguistic annotations more accurately than pipeline approaches in which NLP-based methods are applied to a hypothesized ASR transcript. Experimental results on Japanese and English datasets show that the proposed ASR system is capable of simultaneously producing high-quality transcriptions and linguistic annotations.
机译:我们提出了一种基于变压器的序列到序列模型,用于自动语音识别(ASR),其能够同时通过语言信息(例如音素转录物或语音(POS)标签)的语言信息同时转录和注释音频。由于语言信息在自然语言处理(NLP)中很重要,因此提议的ASR对语音接口应用特别有用,包​​括组合ASR和NLP的口语对话系统和语音翻译。为了产生语言注释,我们使用修改的训练目标训练ASR系统:目标转录物中的每个图形或多图形单元之后是对齐的音素序列和/或POS标签。由于我们的方法可以访问底层音频数据,我们可以比流水线方法更准确地估计语言注释,其中基于NLP的方法应用于假设的ASR转录物。日语和英语数据集的实验结果表明,所提出的ASR系统能够同时产生高质量的转录和语言注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号