首页> 外文会议>Conference of the International Speech Communication Association >Improved Models for Automatic Punctuation Prediction for Spoken and Written Text
【24h】

Improved Models for Automatic Punctuation Prediction for Spoken and Written Text

机译:改进的自动标点符号预测模型,用于口语和书面文本

获取原文

摘要

This paper presents improved models for the automatic prediction of punctuation marks in written or spoken text. Various kinds of textual features are combined using Conditional Random Fields. These features include language model scores, token n-grams, sentence length, and syntactic information extracted from parse trees. The resulting models are evaluated on several different tasks, ranging from formal newspaper text to informal, dictated messages and documents, and from written text to spoken text. The newly developed models outperform a hidden-event language model by up to 26% relative in F-score. Evaluation of punctuation prediction on erroneous ASR output as well as evaluation against multiple references is not straightforward. We propose modifications of existing evaluation methods to handle these cases.
机译:本文介绍了在书面或口语文本中自动预测标点符号的改进模型。使用条件随机字段组合各种文本功能。这些功能包括语言模型分数,令牌n-gram,句子长度和从解析树中提取的句法信息。由此产生的模型在几个不同的任务中进行评估,从正式报纸文本到非正式的,决定的消息和文档,以及从书面文本到口语文本。新开发的模型优于隐藏事件语言模型,在F分数中可达26%。对错误ASR输出的标点符号预测的评估以及对多引用的评估并不简单。我们建议修改现有的评估方法来处理这些情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号