首页> 外文会议>Workshop of the Australasian Language Technology Association >Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP
【24h】

Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP

机译:LSTM会比CNN遗忘更多吗? NLP中灾难性遗忘的实证研究

获取原文

摘要

Catastrophic forgetting - whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a "catastrophic" drop in performance over the first task - is a hurdle in the development of better transfer learning techniques. Despite impressive progress in reducing catastrophic forgetting, we have limited understanding of how different architectures and hyper-parameters affect forgetting in a network. In this paper, we aim to understand factors which cause forgetting during sequential training. Our primary finding is that CNNs forget less than LSTMs. We show that max-pooling is the underlying operation which helps CNNs alleviate forgetting compared to LSTMs. We also found that curriculum learning (Bengio et al., 2009), placing a hard task towards the end of task sequence, reduces forgetting. We analysed the effect of fine-tuning contextual embeddings on catastrophic forgetting, and found that using fixed word embeddings is preferable to fine-tuning.
机译:灾难性的遗忘-在一个任务上训练的模型可以在第二个任务上进行微调,并且这样做会导致性能比第一个任务“灾难性”地下降-是开发更好的迁移学习技术的障碍。尽管在减少灾难性遗忘方面取得了令人瞩目的进展,但是我们对不同的体系结构和超参数如何影响网络中的遗忘知之甚少。在本文中,我们旨在了解导致顺序训练中遗忘的因素。我们的主要发现是CNN的遗忘少于LSTM。我们显示,最大池化是底层的操作,与LSTM相比,它可以帮助CNN减轻遗忘。我们还发现,课程学习(Bengio等,2009)将艰巨的任务放在任务序列的末尾,可以减少遗忘。我们分析了微调上下文嵌入对灾难性遗忘的影响,发现使用固定单词嵌入比微调更可取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号