Topics to Avoid: Demoting Latent Confounds in Text Classification

机译：应避免的主题：降级文本分类中的潜在混杂问题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden, the classifier predicts that the author's native language is Swedish). We propose a method that represents the latent topical confounds and a model which "unlearns" confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound. We show that this model generalizes better and learns features that are indicative of the writing style rather than the content.~1

机译：尽管在许多文本分类任务上表现出色，但是深度神经网络仍倾向于学习特定于训练数据的频繁的表面模式，并且总不能很好地泛化。在这项工作中，我们观察到有关本地语言识别任务的限制。我们发现，在测试集上表现良好的标准文本分类器最终会学习与预测任务混杂的主题功能（例如，如果输入文本中提到瑞典，则分类器会预测作者的母语是瑞典语）。我们提出了一种表示潜在主题混杂的方法，以及一种通过预测输入文本和混杂标签来“取消学习”混杂特征的模型;但是我们以交替的方式对抗性地训练这两个预测变量，以学习预测正确标签的文本表示形式，但不太容易使用有关混淆的信息。我们证明了该模型的泛化效果更好，并且学习了表示写作风格而不是内容的特征。〜1

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|4151-4161|共11页
会议地点
作者
Sachin Kumar; Shuly Wintrier; Noah A. Smith; Yulia Tsvetkov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:53:00

相似文献

外文文献
中文文献
专利

1. Label Propagation for Text Classification Using Latent Topics [J] . Akiko Eriguchi, Ichiro Kobayashi Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2014,第5a106期

机译：使用潜在主题进行文本分类的标签传播
2. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection [J] . Hu Xuegang, Wang Haiyan, Li Peipei Pattern recognition letters . 2018,第DECa1期

机译：使用短文本扩展和概念漂移检测的基于在线Biterm主题模型的短文本流分类
3. Robust Text Classification under Confounding Shift [J] . Virgile Landeiro, Aron Culotta The Journal of Artificial Intelligence Research . 2018,第8期

机译：混杂移位下的稳健文本分类
4. Topics to Avoid: Demoting Latent Confounds in Text Classification [C] . Sachin Kumar, Shuly Wintrier, Noah A. Smith, International joint conference on natural language processing . 2019

机译：避免主题：在文本分类中降级潜在的混淆
5. Removing Confounds in Text Classification for Computational Social Science [D] . Landeiro Dos Reis, Virgile. 2018

机译：消除计算社会科学文本分类中的混杂问题
6. A Topic-modeling Based Framework for Drug-drug Interaction Classification from Biomedical Text [O] . Dingcheng Li, Sijia Liu, Majid Rastegar-Mojarad, 2016

机译：基于主题模型的生物医学文献中药物相互作用分类框架
7. Topics to Avoid: Demoting Latent Confounds in Text Classification [O] . Sachin Kumar, Shuly Wintner, Noah A. Smith, 2019

机译：避免主题：在文本分类中降级潜在的混淆
8. Text Classification of installation Support Contract Topic Models for Category Management. [R] . Sevier, W. C. 2018

机译：文本分类安装支持合同主题模型的类别管理。

Topics to Avoid: Demoting Latent Confounds in Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅