Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model

机译：用预先接受训练的语言模型生成人类可读成绩单，用于自动语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the constructed test set, our model outperforms a production two-step pipeline-based post-processing method by a large margin of 13.26 on readability-aware WER (RA-WER) and 17.53 on BLEU metrics. Human evaluation also demonstrates that our method can generate more human-readable transcripts than the baseline method.

机译：现代自动语音识别（ASR）系统可以在识别准确性方面实现高性能。然而，由于在口语通信中常见的传出，滤波器和其他勘误表，完美准确的成绩单仍然可能具有挑战性。许多下游任务和人类读者依赖于ASR系统的输出;因此，由扬声器和ASR系统引入的错误将传播到管道中的下一个任务。在这项工作中，我们提出了一个ASR后处理模型，该模型旨在将输出的不正确和嘈杂的ASR转换为人类和下游任务的可读文本。我们利用元数据提取（MDE）语料库来构建专用的数据集进行学习。由于数据集很小，我们提出了一种新颖的数据增强方法，并使用两级训练策略来微调罗伯塔预训练模型。在构造的测试集上，我们的模型优于一个在Bleu指标上的可读性感知WER（RA-WER）和17.53上的大幅度为13.26的生产两步管道的后处理方法。人类评估还证明我们的方法可以产生比基线方法更具人性可读的转录物。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|7578-7582|共5页
会议地点
作者
Junwei Liao; Yu Shi; Ming Gong; Linjun Shou; Sefik Eskimez; Liyang Lu; Hong Qu; Michael Zeng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Pipelines; Transforms; Production; Signal processing; Metadata; Noise measurement;

机译：培训;管道;变换;生产;信号处理;元数据;噪声测量;

相似文献

外文文献
中文文献
专利

1. Machine recognition of human language Part I???Automatic speech recognition [J] . Lindgren Nilo Spectrum, IEEE . 1965,第3期

机译：人类语言的机器识别第一部分：自动语音识别
2. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
3. Speech Encoding in the Human Auditory Periphery: Modeling and Quantitative Assessment by Means of Automatic Speech Recognition [J] . Holmberg Marcus Fortschritt-Berichte VDI, Reihe 8. Mess-, Steuerungs- und Regelungstechnik . 2009,第1162期

机译：人类听觉外围的语音编码：借助自动语音识别的建模和定量评估
4. Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition [C] . Dziadzio Sebastian, Nabozny Aleksandra, Smywinski-Pohl Aleksander, Federated Conference on Computer Science and Information Systems . 2015

机译：在自动语音识别的情况下，在书面文字和语音记录上训练的语言模型的比较
5. Investigating different models for cross-language information retrieval from automatic speech transcripts. [D] . Alzghool, Muath. 2009

机译：研究用于从自动语音笔录中获取跨语言信息的不同模型。
6. Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition [O] . Hua Zhang, Ruoyun Gou, Jili Shang, 2021

机译：训练的深度卷积神经网络模型注意语音情感识别
7. Modelling Human Speech Recognition using Automatic Speech Recognition Paradigms in SpeM [O] . Scharenborg O.E., McQueen J.M., Bosch L.F.M. ten, 2003

机译：在SpeM中使用自动语音识别范例对人类语音识别进行建模
8. Measuring Human Readability of Machine Generated Text: Three Case Studies in Speech Recognition and Machine Translation [R] . Jones, D., Gibson, E., Shen, W., 2005

机译：测量机器生成文本的人类可读性：语音识别和机器翻译的三个案例研究

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model

摘要

著录项

相似文献

相关主题

期刊订阅