【24h】

Lightly-supervised Representation Learning with Global Interpretability

机译:通过全球可解释性轻轻监督的代表学习

获取原文

摘要

We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i.e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning. Our algorithm iteratively learns custom em-beddings for both the multi-word entities to be extracted and the patterns that match them from a few example entities per category. We demonstrate that this representation-based approach outperforms three other state-of-the-art bootstrapping approaches on two datasets: CoNLL-2003 and OntoNotes. Additionally, using these embeddings, our approach outputs a globally-interpretable model consisting of a decision list, by ranking patterns based on their proximity to the average entity embedding in a given class. We show that this interpretable model performs close to our complete bootstrapping model, proving that representation learning can be used to produce interpretable models with small loss in performance. This decision list can be edited by human experts to mitigate some of that loss and in some cases outperform the original model.
机译:我们提出了一种用于信息提取的轻度监督方法,特别是指定实体分类,其结合了传统自动启动的好处,即,利用提取模式的有限注释和解释性,具有代表学习中提出的鲁棒学习方法。我们的算法迭代地了解要提取的多字实体的自定义EM-BEDDINGS以及与每个类别的少数示例实体匹配的模式。我们展示了基于代表的方法在两个数据集上表现出三个其他最先进的自发的方法:Conll-2003和Ontonotes。另外,使用这些嵌入式,我们的方法通过基于它们的邻近嵌入给定类的平均实体嵌入来排序模式来输出由决定列表组成的全球可解释模型。我们表明,该可解释模型执行接近我们的完整引导模型,证明了表示学习可以用于生产具有小的性能损失的可解释模型。该决定列表可以由人类专家编辑,以减轻其中一些损失,并且在某些情况下优于原始模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号