首页> 外文会议> >Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction
【24h】

Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction

机译:超越标签噪声:远距离监督关系提取中的偏移标签分布很重要

获取原文

摘要

In recent years there is a surge of interest in applying distant supervision (DS) to automatically generate training data for relation extraction (RE). In this paper, we study the problem what limits the performance of DS-trained neural models, conduct thorough analyses, and identify a factor that can influence the performance greatly, shifted label distribution. Specifically, we found this problem commonly exists in real-world DS datasets, and without special handing, typical DS-RE models cannot automatically adapt to this shift, thus achieving deteriorated performance. To further validate our intuition, we develop a simple yet effective adaptation method for DS-trained models, bias adjustment, which updates models learned over the source domain (i.e., DS training set) with a label distribution estimated on the target domain (i.e., test set). Experiments demonstrate that bias adjustment achieves consistent performance gains on DS-trained models, especially on neural models, with an up to 23% relative F1 improvement, which verifies our assumptions. Our code and data can be found at https://github.com/INK-USC/shifted-label-distribution.
机译:近年来,使用远程监督(DS)来自动生成用于关系提取(RE)的训练数据的兴趣激增。在本文中,我们研究了限制DS训练的神经模型性能的问题,进行了彻底的分析,并确定了一个会极大影响性能的因素,即标签分布发生了变化。具体来说,我们发现此问题通常存在于现实世界的DS数据集中,并且如果没有特殊处理,典型的DS-RE模型就无法自动适应此变化,从而导致性能下降。为了进一步验证我们的直觉,我们为DS训练的模型开发了一种简单而有效的调整方法,即偏差调整,该模型使用在目标域(即,测试集)。实验表明,偏差调整在DS训练的模型(尤其是在神经模型)上获得了一致的性能提升,相对F1改善高达23%,这验证了我们的假设。我们的代码和数据可以在https://github.com/INK-USC/shifted-label-distribution中找到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号