首页> 外文会议>Annual meeting of the Association for Computational Linguistics >DisSent: Learning Sentence Representations from Explicit Discourse Relations
【24h】

DisSent: Learning Sentence Representations from Explicit Discourse Relations

机译:异议:从明确的话语关系中学习句子表示

获取原文

摘要

Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Our fixed sentence embeddings achieve high performance on a variety of transfer tasks, including Sen-tEval, and we achieve state-of-the-art results on Penn Discourse Treebank's implicit relation prediction task.
机译:学习有效的句子表示法是自然语言理解的核心任务之一。现有模型要么在大量文本上训练,要么需要昂贵的手动策划的句子关系数据集。我们证明,借助依存关系分析和基于规则的规则,我们可以利用显式话语关系来策划高质量的句子关系任务。我们表明,我们精选的数据集为学习句子含义的矢量表示提供了一个极好的信号,该关系表示只能将两个句子的含义结合起来才能确定的关系。我们证明了自动管理的语料库允许双向LSTM句子编码器产生高质量的句子嵌入,并且可以充当较大模型(例如BERT)的监督微调数据集。我们的固定句子嵌入功能在包括Sen-tEval在内的各种传输任务上都具有很高的性能,并且我们在Penn Discourse Treebank的隐式关系预测任务上获得了最先进的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号