首页> 外文会议>European Conference on Information Retrieval >Classifying Scientific Publications with BERT - Is Self-attention a Feature Selection Method?
【24h】

Classifying Scientific Publications with BERT - Is Self-attention a Feature Selection Method?

机译:用伯特分类科学出版物 - 是自我关注特征选择方法吗?

获取原文

摘要

We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles over a taxonomy of research disciplines. We observe how self-attention focuses on words that are highly related to the domain of the article. Particularly, a small subset of vocabulary words tends to receive most of the attention. We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification in order to characterize self-attention as a possible feature selection approach. Using ConceptNet as ground truth, we also find that attended words are more related to the research fields of the articles. However, conventional feature selection methods are still a better option to learn classifiers from scratch. This result suggests that, while self-attention identifies domain-relevant terms, the discriminatory information in BERT is encoded in the contextualized outputs and the classification layer. It also raises the question whether injecting feature selection methods in the self-attention mechanism could further optimize single sequence classification using transformers.
机译:我们调查伯特的自我关注机制在微调方案中,为研究学科分类的科学文章分类。我们观察到自我关注的重点是与文章领域高度相关的单词。特别是,小型词汇单词往往会接受大部分注意力。我们使用通常用于文本分类的特征选择方法进行比较和评估最具出席的单词的子集,以便以可能的特征选择方法表征自我关注。使用ConceptNet作为地面真理,我们发现参与的单词与文章的研究领域更相关。然而,传统的特征选择方法仍然是从头开始学习分类器的更好选择。该结果表明,虽然自我关注识别域相关术语,但是BERT中的鉴别信息在上下图中的输出和分类层中被编码。它还提出了对自我关注机制中的注入特征选择方法的问题还可以进一步优化使用变压器的单个序列分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号