首页> 外文会议>International conference on semantic systems >Using Weak Supervision to Identify Long-Tail Entities for Knowledge Base Completion
【24h】

Using Weak Supervision to Identify Long-Tail Entities for Knowledge Base Completion

机译:使用弱监督来识别长尾实体以完成知识库

获取原文

摘要

Data from relational web tables can be used to augment cross-domain knowledge bases like DBpedia, Wikidata, or the Google Knowledge Graph with descriptions of entities that are not yet part of the knowledge base. Such long-tail entities can include for instance small villages, niche songs, or athletes that play in lower-level leagues. In previous work, we have presented an approach to successfully assemble descriptions of long-tail entities from relational HTML tables using supervised matching methods and manually labeled training data in the form of positive and negative entity matches. Manually labeling training data is a laborious task given knowledge bases covering many different classes. In this work, we investigate reducing the labeling effort for the task of long-tail entity extraction by using weak supervision. We present a bootstrapping approach that requires domain experts to provide a small set of simple, class-specific matching rules, instead of requiring them to label a large set of entity matches, thereby reducing the human supervision effort considerably. We evaluate this weak supervision approach and find that it performs only slightly worse compared to methods that rely on large sets of manually labeled entity matches.
机译:关系网络表中的数据可用于扩展跨域知识库,例如DBpedia,Wikidata或Google知识图谱,其中包含对尚未包含在知识库中的实体的描述。这样的长尾实体可以包括例如小村庄,小众歌曲或参加低级联赛的运动员。在先前的工作中,我们提出了一种使用监督匹配方法和以正负实体匹配形式手动标记训练数据来成功地从关系HTML表格中组合长尾实体描述的方法。给定涵盖许多不同类别的知识库,手动标记培训数据是一项艰巨的任务。在这项工作中,我们研究了通过使用弱监督来减少标注长尾实体的任务。我们提出了一种引导方法,该方法要求领域专家提供一小组简单的,特定于类的匹配规则,而不是要求他们标记大量的实体匹配项,从而大大减少了人工监督的工作量。我们评估了这种薄弱的监督方法,发现与依赖大量手动标记的实体匹配项的方法相比,它的执行效果仅稍差一些。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号