首页> 外文会议>International Conference on Principles of Knowledge Representation and Reasoning >Seq2KG: An End-to-End Neural Model for Domain Agnostic Knowledge Graph (not Text Graph) Construction from Text
【24h】

Seq2KG: An End-to-End Neural Model for Domain Agnostic Knowledge Graph (not Text Graph) Construction from Text

机译:SEQ2KG:来自文本的域名知识图(不是文本图)构造的端到端神经模型

获取原文

摘要

Knowledge Graph Construction (KGC) from text unlocks information held within unstructured text and is critical to a wide range of downstream applications. General approaches to KGC from text are heavily reliant on the existence of knowledge bases, yet most domains do not even have an external knowledge base readily available. In many situations this results in information loss as a wealth of key information is held within "non-entities". Domain-specific approaches to KGC typically adopt unsupervised pipelines, using carefully crafted linguistic and statistical patterns to extract co-occurred noun phrases as triples, essentially constructing text graphs rather than true knowledge graphs. In this research, for the first time, in the same flavour as Col-lobert et al.'s seminal work of "Natural language processing (almost) from scratch" in 2011, we propose a Seq2KG model attempting to achieve "Knowledge graph construction (almost) from scratch". An end-to-end Sequence to Knowledge Graph (Seq2KG) neural model jointly learns to generate triples and resolves entity types as a multi-label classification task through deep learning neural networks. In addition, a novel evaluation metric that takes both semantic and structural closeness into account is developed for measuring the performance of triple extraction. We show that our end-to-end Seq2KG model performs on par with a state of the art rule-based system which outperformed other neural models and won the first prize of the first Knowledge Graph Contest in 2019. A new annotation scheme and three high-quality manually annotated datasets are available to help promote this direction of research.
机译:从文本中的知识图形构造(KGC)解锁了在非结构化文本中保持的信息,对广泛的下游应用是至关重要的。 kgc从文本到kgc的一般方法严重依赖于知识库存在,但大多数域甚至没有容易获得外部知识库。在许多情况下,这导致信息损失作为大量关键信息持有“非实体”。特定于kgc的方法通常采用无人监督的管道,使用仔细制作的语言和统计模式来提取作为三元组的共同发生的名词短语,基本上构建文本图而不是真正的知识图形。在这项研究中,在与Col-Lobert等人的同样的味道中,2011年的“自然语言处理(几乎)的原创作品”,我们提出了一个SEQ2KG模型试图实现“知识图建设” (几乎)从划痕中“。知识图(SEQ2KG)神经模型的端到端序列共同学会通过深度学习神经网络生成三元并将实体类型解析为多标签分类任务。此外,还开发了一种采用语义和结构亲密度的新型评估度量,用于测量三重提取的性能。我们表明,我们的端到端SEQ2KG模型与基于艺术规则的系统的状态表现出,这表现出越来越优于其他神经模型,并在2019年赢得了第一个知识图表比赛的一等奖。一个新的注释计划和三个高-Quality手动注释的数据集可用于帮助促进此研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号