End-to-end ASR to jointly predict transcriptions and linguistic annotations

机译：结束到底ASR共同预测转录和语言注释

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags. Since linguistic information is important in natural language processing (NLP), the proposed ASR is especially useful for speech interface applications, including spoken dialogue systems and speech translation, which combine ASR and NLP. To produce linguistic annotations, we train the ASR system using modified training targets: each grapheme or multi-grapheme unit in the target transcript is followed by an aligned phoneme sequence and/or POS tag. Since our method has access to the underlying audio data, we can estimate linguistic annotations more accurately than pipeline approaches in which NLP-based methods are applied to a hypothesized ASR transcript. Experimental results on Japanese and English datasets show that the proposed ASR system is capable of simultaneously producing high-quality transcriptions and linguistic annotations.

机译：我们提出了一种基于变压器的序列到序列模型，用于自动语音识别（ASR），其能够同时通过语言信息（例如音素转录物或语音（POS）标签）的语言信息同时转录和注释音频。由于语言信息在自然语言处理（NLP）中很重要，因此提议的ASR对语音接口应用特别有用，包括组合ASR和NLP的口语对话系统和语音翻译。为了产生语言注释，我们使用修改的训练目标训练ASR系统：目标转录物中的每个图形或多图形单元之后是对齐的音素序列和/或POS标签。由于我们的方法可以访问底层音频数据，我们可以比流水线方法更准确地估计语言注释，其中基于NLP的方法应用于假设的ASR转录物。日语和英语数据集的实验结果表明，所提出的ASR系统能够同时产生高质量的转录和语言注释。

著录项

来源
《Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2021年|1861-1871|共11页
会议地点
作者
Motoi Omachi; Yuya Fujita; Shinji Watanabe; Matthew Wiesner;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations [J] . Olivera Grujic, Tanya N. Phung, Soo Bin Kwon, Nature Communications . 2020,第1期

机译：缺乏预测表观态和转录因子结合注释的受约束非偏振碱基的鉴定与表征
2. Epigenomic annotation of enhancers predicts transcriptional regulators of human neural crest [J] . Rada-IglesiasA., BajpaiR., PrescottS., Cell stem cell . 2012,第5期

机译：增强子的表观基因组注释可预测人类神经c的转录调控因子
3. Joint multi-view representation and image annotation via optimal predictive subspace learning [J] . Zhe Xue, Guorong Li, Qingming Huang Information Sciences: An International Journal . 2018,第期

机译：通过最佳预测子空间学习联合多视图表示和图像注释
4. Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription [C] . Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Spoken Language Technology Workshop . 2021

机译：探索端到端的多通道ASR，具有偏置信息以进行转录
5. A framework for representing and jointly reasoning over linguistic and non-linguistic knowledge. [D] . Murugesan, Arthi. 2009

机译：用于表示和共同推理语言和非语言知识的框架。
6. Epigenomic annotation of enhancers predicts transcriptional regulators of human neural crest [O] . Alvaro Rada-Iglesias, Ruchi Bajpai, Sara Prescott, -1

机译：增强的表观注释预测人类的神经嵴的转录调控因子
7. Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription [O] . Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, 2021

机译：探索端到端的多通道ASR，具有偏置信息以进行转录

End-to-end ASR to jointly predict transcriptions and linguistic annotations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅