【24h】

Spotting Spurious Data with Neural Networks

机译:用神经网络发现虚假数据

获取原文

摘要

Automatic identification of spurious instances (those with potentially wrong labels in datasets) can improve the quality of existing language resources, especially when annotations are obtained through crowdsourc-ing or automatically generated based on coded rankings. In this paper, we present an effective approach inspired by queueing theory and psychology of learning to automatically identify spurious instances in datasets. Our approach discriminates instances based on their "difficulty to learn," determined by a downstream learner. Our method can be applied to any dataset assuming the existence of a neural network model for the target task of the dataset. Our best approach outperforms competing state-of-the-art baselines and has a MAP of 0.85 and 0.22 in identifying spurious instances in synthetic and carefully-crowdsourced real-world datasets respectively.
机译:自动识别虚假实例(那些在数据集中带有潜在错误标签的实例)可以提高现有语言资源的质量,尤其是在通过众筹获得注释或根据编码排名自动生成注释时。在本文中,我们提出了一种有效的方法,该方法受到排队学习理论和心理学的启发,可以自动识别数据集中的虚假实例。我们的方法根据实例的下游学习者的“学习难度”来区分实例。假设存在针对数据集目标任务的神经网络模型,我们的方法可以应用于任何数据集。我们的最佳方法优于竞争的最新基准,在识别合成数据和精心收集的真实世界数据集中的虚假实例时,其MAP分别为0.85和0.22。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号