首页> 外文会议>Nordic conference of computational Linguistics >Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations
【24h】

Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

机译:预测与预先接受训练的上下文化词表示的文本的韵律突出

获取原文

摘要

In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe the dataset construction and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of dis-cretized prosodic prominence. We show that pre-trained contextualized word representations from BERT outperform the other models even with less than 10% of the training data. Finally we discuss the dataset in light of the results and point to future research and plans for further improving both the dataset and methods of predicting prosodic prominence from text. The dataset and the code for the models are publicly available.
机译:在本文中,我们介绍了一种新的自然语言处理数据集和基准,用于预测书面文本的韵律突出。据我们所知,这将是具有韵律标签的最大的公共数据集。我们详细描述了数据集结构和所产生的基准数据集,并培训许多不同的模型,范围从基于特征的分类器到神经网络系统,以预测Dis-Creetized韵律突出。我们表明,即使占训练数据的少于10%,伯特的预先训练的上下文化词表示越优于其他模型。最后,我们鉴于结果和指向未来的研究和计划进一步改进数据集和预测文本的韵律突出的方法,指向未来的研究和计划。数据集和模型的代码是公开可用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号