首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification
【24h】

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

机译:鼓励段落嵌入记住句子同一性可改善分类

获取原文

摘要

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.
机译:尽管段落嵌入模型对于下游分类任务非常有效,但是它们学习并编码为单个向量的内容仍然是不透明的。在本文中,我们研究了Zhang等人提出的最先进的段落嵌入方法。 (2017)并发现它不能可靠地判断给定句子是否出现在输入段落中。我们制定了一个句子内容任务来探究这种基本的语言属性,发现即使是更简单的词袋方法也可以轻松解决该问题。这一结果促使我们取代张等人的基于重建的目标。 (2017),我们的句子内容探查目标是在半监督的情况下进行。尽管它很简单,但我们的目标相对于段落重构有以下方面的改进:(1)基准数据集的下游分类准确性,(2)更快的训练速度以及(3)更好的泛化能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号