【24h】

Improving Distantly-supervised Entity Typingwith Compact Latent Space Clustering

机译:利用紧凑型潜在空间聚类改进远程监督的实体键入

获取原文

摘要

Recently, distant supervision has gained great success on Fine-grained Entity Typing (FET). Despite its efficiency in reducing manual labeling efforts, it also brings the challenge of dealing with false entity type labels, as distant supervision assigns labels in a context-agnostic manner. Existing works alleviated this issue with partial-label loss, but usually suffer from confirmation bias, which means the classifier fit a pseudo data distribution given by itself. In this work, we propose to regularize distantly supervised models with Compact Latent Space Clustering (CLSC) to bypass this problem and effectively utilize noisy data yet. Our proposed method first dynamically constructs a similarity graph of different entity mentions; infer the labels of noisy instances via label propagation. Based on the inferred labels, mention embeddings are updated accordingly to encourage entity mentions with close semantics to form a compact cluster in the embedding space, thus leading to better classification performance. Extensive experiments on standard benchmarks show that our CLSC model consistently outperforms state-of-the-art distantly supervised entity typing systems by a significant margin.
机译:最近,在细粒度实体键入(FET)方面,远程监管获得了巨大的成功。尽管它在减少手动标记工作方面效率很高,但是由于远程监管以与上下文无关的方式分配标签,因此它也带来了处理虚假实体类型标签的挑战。现有的工作减轻了部分标签丢失的问题,但是通常会遭受确认偏差的困扰,这意味着分类器适合于自身给出的伪数据分布。在这项工作中,我们建议使用紧凑型潜在空间聚类(CLSC)规范远距离监督的模型,以绕过此问题并有效地利用嘈杂的数据。我们提出的方法首先动态地构建不同实体提及的相似度图;通过标签传播推断出嘈杂实例的标签。基于推断的标签,对提及嵌入进行相应更新,以鼓励具有紧密语义的实体提及在嵌入空间中形成紧凑的群集,从而带来更好的分类性能。在标准基准上进行的大量实验表明,我们的CLSC模型始终在性能上远胜过最先进的远程监督实体键入系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号