首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Quantifying Appropriateness of Summarization Data for Curriculum Learning
【24h】

Quantifying Appropriateness of Summarization Data for Curriculum Learning

机译:量化课程学习摘要数据的适当性

获取原文

摘要

Much research has reported the training data of summarization models are noisy; summaries often do not reflect what is written in the source texts. We propose an effective method of curriculum learning to train summarization models from such noisy data. Curriculum learning is used to train sequence-to-sequence models with noisy data. In translation tasks, previous research quantified noise of the training data using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, we propose a model that can quantify noise from a single noisy corpus. We conduct experiments on three summarization models; one pretrained model and two non-pretrained models, and verify our method improves the performance. Furthermore, we analyze how different curricula affect the performance of pretrained and non-pretrained summarization models. Our result on human evaluation also shows our method improves the performance of summarization models.
机译:许多研究报告报告摘要模型的培训数据是嘈杂的; 摘要通常不会反映源文本中写入的内容。 我们提出了一种有效的课程学习方法,从而从这种嘈杂的数据训练摘要模型。 课程学习用于使用嘈杂的数据训练序列到序列模型。 在翻译任务中,先前的研究使用嘈杂和清洁基层训练的两种型号量化训练数据的噪声。 由于此类语料库不存在于摘要字段中,因此我们提出了一种可以量化来自单个嘈杂语料库的噪声的模型。 我们在三种摘要模型进行实验; 一个佩带的模型和两个非预磨损的模型,并验证我们的方法提高了性能。 此外,我们分析了不同的课程如何影响预磨料和非预借鉴摘要模型的性能。 我们对人类评估的结果还显示了我们的方法提高了摘要模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号