【24h】

On the Importance of Delexicalization for Fact Verification

机译:关于非事实化在事实验证中的重要性

获取原文

摘要

While neural networks produce state-of-the-art performance in many NLP tasks, they generally learn from lexical information, which may transfer poorly between domains. Here, we investigate the importance that a model assigns to various aspects of data while learning and making predictions, specifically, in a recognizing textual entailment (RTE) task. By inspecting the attention weights assigned by the model, we confirm that most of the weights are assigned to noun phrases. To mitigate this dependence on lexicalized information, we experiment with two strategies of masking. First, we replace named entities with their corresponding semantic tags along with a unique identifier to indicate lexical overlap between claim and evidence. Second, we similarly replace other word classes in the sentence (nouns, verbs, adjectives, and adverbs) with their super sense tags (Ciaramita and Johnson. 2003). Our results show that, while performance on the in-domain dataset remains on par with that of the model trained on fully lexicalized data, it improves considerably when tested out of domain. For example, the performance of a state-of-the-art RTE model trained on the masked Fake News Challenge (Pomerleau and Rao, 2017) data and evaluated on Fact Extraction and Verification (Thome et al., 2018) data improved by over 10% in accuracy score compared to the fully lexicalized model.
机译:尽管神经网络在许多NLP任务中都表现出最先进的性能,但它们通常会从词汇信息中学习,而词汇信息在领域之间的传递可能很差。在这里,我们研究在学习和做出预测时,特别是在识别文本蕴含(RTE)任务中,模型分配给数据各个方面的重要性。通过检查模型分配的注意力权重,我们确认大部分权重都分配给名词短语。为了减轻对词汇化信息的依赖,我们尝试了两种掩蔽策略。首先,我们将命名实体替换为其对应的语义标签以及唯一的标识符,以表明主张和证据之间的词汇重叠。其次,我们类似地用其超常标记替换句子中的其他单词类别(名词,动词,形容词和副词)(Ciaramita和Johnson。2003年)。我们的结果表明,尽管域内数据集的性能与完全词汇化数据上训练的模型的性能保持一致,但在域外测试时,它的性能有了显着提高。例如,在假的伪造的新闻挑战(Pomerleau and Rao,2017)数据上训练并在事实提取和验证(Thome等人,2018)数据上进行评估的最先进的RTE模型的性能提高了30倍以上。与完全词汇化的模型相比,准确率得分为10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号