Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

机译：鼓励段落嵌入记住句子同一性可改善分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.

机译：尽管段落嵌入模型对于下游分类任务非常有效，但是它们学习并编码为单个向量的内容仍然是不透明的。在本文中，我们研究了Zhang等人提出的最先进的段落嵌入方法。（2017）并发现它不能可靠地判断给定句子是否出现在输入段落中。我们制定了一个句子内容任务来探究这种基本的语言属性，发现即使是更简单的词袋方法也可以轻松解决该问题。这一结果促使我们取代张等人的基于重建的目标。（2017），我们的句子内容探查目标是在半监督的情况下进行。尽管它很简单，但我们的目标相对于段落重构有以下方面的改进：（1）基准数据集的下游分类准确性，（2）更快的训练速度以及（3）更好的泛化能力。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|6331-6338|共8页
会议地点
作者
Tu Vu; Mohit Iyyer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An enhanced embedding method using inter-sentence, inter-word, end-of-line and inter-paragraph spacing [J] . Lip Yee Por, Kok Onn Chee, Tan Fong Ang, International Journal of Physical Sciences . 2011,第36期

机译：使用句子间，单词间，行尾和段落间间距的增强嵌入方法
2. Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records [J] . Qingyu Chen, Jingcheng Du, Sun Kim, BMC Medical Informatics and Decision Making . 2020,第1期

机译：与生物医学Corpora预先培训的句子嵌入的深度学习提高了在电子病历中找到类似句子的表现
3. Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension [J] . Narongsak Chayangkoon, Anongnart Srivihok Engineering and Applied Science Research . 2021,第5期

机译：与亚洲的麻醉相关的Tweet分类使用句子向量的单词嵌入与功能扩展名
4. Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification [C] . Tu Vu, Mohit Iyyer Annual meeting of the Association for Computational Linguistics . 2019

机译：鼓励段落嵌入要记住句子身份改善分类
5. Remembering the GULAG: Community, Identity and Cultural Memory in Russia's Far North, 1987-2018 [D] . Kirk, Tyler C. 2019

机译：纪念古拉格峰：俄罗斯远北地区的社区，身份和文化记忆，1987-2018年
6. Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records [O] . Qingyu Chen, Jingcheng Du, Sun Kim, 2020

机译：在生物医学语料库上预先训练的带有句子嵌入的深度学习可提高在电子病历中查找相似句子的性能
7. Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification [O] . Tu Vu, Mohit Iyyer 2019

机译：鼓励段落嵌入要记住句子身份改善分类

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

摘要

著录项

相似文献

相关主题

期刊订阅