首页> 外文会议>Nordic conference of computational Linguistics >Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations
【24h】

Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

机译:使用预训练的上下文化单词表示从文本预测韵律突出

获取原文

摘要

In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe the dataset construction and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of dis-cretized prosodic prominence. We show that pre-trained contextualized word representations from BERT outperform the other models even with less than 10% of the training data. Finally we discuss the dataset in light of the results and point to future research and plans for further improving both the dataset and methods of predicting prosodic prominence from text. The dataset and the code for the models are publicly available.
机译:在本文中,我们介绍了一种新的自然语言处理数据集和用于从书面文本预测韵律突出的基准。据我们所知,这将是最大的带有韵律标签的公开可用数据集。我们详细描述了数据集的构造和所得的基准数据集,并训练了许多不同的模型,从基于特征的分类器到神经网络系统,以预测离散韵律突出。我们显示,即使训练数据不足10%,来自BERT的预训练上下文语词表示也优于其他模型。最后,我们根据结果讨论数据集,并指出未来的研究和计划,以进一步改进数据集和从文本预测韵律突出的方法。模型的数据集和代码是公开可用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号