首页> 外文会议>Annual Meeting of the Association for Computational Linguistics >Shape of Synth to Come: Why We Should Use Synthetic Data for English Surface Realization
【24h】

Shape of Synth to Come: Why We Should Use Synthetic Data for English Surface Realization

机译:Synth的未来形态:为什么我们应该使用合成数据来实现英语表面

获取原文
获取外文期刊封面目录资料

摘要

The Surface Realization Shared Tasks of 2018 and 2019 were Natural Language Generation shared tasks with the goal of exploring approaches to surface realization from Universal-Dependency-like trees to surface strings for several languages. In the 2018 shared task there was very little difference in the absolute performance of systems trained with and without additional, synthetically created data, and a new rule prohibiting the use of synthetic data was introduced for the 2019 shared task. Contrary to the findings of the 2018 shared task, we show, in experiments on the English 2018 dataset, that the use of synthetic data can have a substantial positive effect - an improvement of almost 8 BLEU points for a previously state-of-the-art system. We analyse the effects of synthetic data, and we argue that its use should be encouraged rather than prohibited so that future research efforts continue to explore systems that can take advantage of such data.
机译:2018年和2019年的表面实现共享任务是自然语言生成共享任务,目标是探索从通用依赖(如树)到几种语言的表面字符串的表面实现方法。在2018年的共享任务中,使用和不使用额外的合成数据训练的系统的绝对性能几乎没有差异,2019年的共享任务引入了一项新规则,禁止使用合成数据。与2018年共享任务的研究结果相反,我们在2018年英语数据集的实验中表明,使用合成数据可以产生显著的积极影响——对于以前最先进的系统来说,几乎提高了8个BLEU点。我们分析了合成数据的影响,认为应该鼓励而不是禁止使用合成数据,以便未来的研究工作继续探索能够利用此类数据的系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号