首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Neural Data-to-Text Generation with LM-based Text Augmentation
【24h】

Neural Data-to-Text Generation with LM-based Text Augmentation

机译:基于LM的文本增强的神经数据到文本生成

获取原文

摘要

For many new application domains for data-to-text generation, the main obstacle in training neural models consists of a lack of training data. While usually large numbers of instances are available on the data side, often only very few text samples are available. To address this problem, we here propose a novel few-shot approach for this setting. Our approach automatically augments the data available for training by (ⅰ) generating new text samples based on replacing specific values by alternative ones from the same category, (ⅱ) generating new text samples based on GPT-2, and (ⅲ) proposing an automatic method for pairing the new text samples with data samples. As the text augmentation can introduce noise to the training data, we use cycle consistency as an objective, in order to make sure that a given data sample can be correctly reconstructed after having been formulated as text (and that text samples can be reconstructed from data). On both the E2E and WebNLG benchmarks, we show that this weakly supervised training paradigm is able to outperform fully supervised seq2seq models with less than 10% annotations. By utilizing all annotated data, our model can boost the performance of a standard seq2seq model by over 5 BLEU points, establishing a new state-of-the-art on both datasets.
机译:对于数据到文本生成的许多新应用领域,培训神经模型中的主要障碍包括缺乏培训数据。虽然通常在数据端可用大量实例,但通常只有很少的文本样本。为了解决这个问题,我们在此提出了一种新颖的少量拍摄方法。我们的方法会自动增强可用于培训的数据(Ⅰ)基于从同一类别的替代类别取代特定值,(Ⅱ)基于GPT-2的新文本样品,(Ⅲ)提出自动化将新文本样本与数据样本配对的方法。由于文本增强可以向训练数据引入噪声,我们使用循环一致性作为目标,以确保在已制定为文本之后可以正确重建给定的数据样本(并且可以从数据重建该文本样本)。在E2E和WebnLG基准测试中,我们表明,这种弱监督的培训范式能够以不到10%的注释,完全监督的SEQ2SEQ模型。通过利用所有注释的数据,我们的模型可以通过超过5个BLEU点提高标准SEQ2SEQ模型的性能,在两个数据集中建立新的最先进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号