【24h】

Automatic punctuation generation for speech

机译:语音自动标点生成

获取原文

摘要

Automatic generation of punctuation is an essential feature for many speech-to-text transcription tasks. This paper describes a Maximum A-Posteriori (MAP) approach for inserting punctuation marks into raw word sequences obtained from Automatic Speech Recognition (ASR). The system consists of an “acoustic model” (AM) for prosodic features (actually pause duration) and a “language model” (LM) for text-only features. The LM combines three components: an MLP-based trigger-word model and a forward and a backward trigram punctuation predictor. The separation into acoustic and language model allows to learn these models on different corpora, especially allowing the LM to be trained on large amounts of data (text) for which no acoustic information is available. We find that the trigger-word LM is very useful, and further improvement can be achieved when combining both prosodic and lexical information. We achieve an F-measure of 81.0% and 56.5% for voicemails and podcasts, respectively, on reference transcripts, and 69.6% for voicemails on ASR transcripts.
机译:标点符号的自动生成是许多语音转文本转录任务的基本功能。本文介绍了一种将标点符号插入从自动语音识别(ASR)获得的原始单词序列中的最大A后验(MAP)方法。该系统由用于韵律特征的“声学模型”(AM)(实际上是暂停时间)和仅针对文本特征的“语言模型”(LM)组成。 LM包含三个组件:基于MLP的触发词模型以及前向和后向三元组标点预测器。声音和语言模型的分离允许在不同的语料库上学习这些模型,尤其是允许LM在没有可用声音信息的大量数据(文本)上进行训练。我们发现触发词LM非常有用,并且在结合韵律信息和词汇信息时可以实现进一步的改进。对于参考成绩单,语音邮件和播客的F度量分别达到81.0%和56.5%,对于ASR成绩单,语音邮件的F-度量达到69.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号