首页> 外文会议>4th workshop on cognitive aspects of the lexicon >(Digital) Goodies from the ERC Wishing Well: BabelNet, Babelfy, Video Games with a Purpose and the Wikipedia Bitaxonomy
【24h】

(Digital) Goodies from the ERC Wishing Well: BabelNet, Babelfy, Video Games with a Purpose and the Wikipedia Bitaxonomy

机译:ERC的(数字)好东西许愿井:BabelNet,Babelfy,有目的的视频游戏和Wikipedia Bitaxonomy

获取原文
获取原文并翻译 | 示例

摘要

Multilinguality is a key feature of today's Web, and it is this feature that we leverage and exploit in our research work at the Sapienza University of Rome's Linguistic Computing Laboratory, which I am going to overview and showcase in this talk. In order to construct the BabelNet network, we extract at different stages: from WordNet, all available word senses (as concepts) and all the lexical and semantic pointers between synsets (as relations); from Wikipedia, all the Wikipages (i.e., Wikipages, as concepts) and semantically unspecified relations from their hyperlinks. WordNet and Wikipedia overlap both in terms of concepts and relations: this overlap makes the merging between the two resources possible, enabling the creation of a unified knowledge resource. In order to enable multilinguality, we collect the lexical realizations of the available concepts in different languages. Finally, we connect the multilingual Babel synsets by establishing semantic relations between them. Babelfy works in three steps: first, given a lexicalized semantic network, we associate with each vertex, i.e., either concept or named entity, a semantic signature, that is, a set of related vertices. This is a preliminary step which needs to be performed only once, independently of the input text. Second, given a text, we extract all the linkable fragments from this text and, for each of them, list the possible meanings according to the semantic network. Third, we create a graph-based semantic interpretation of the whole text by linking the candidate meanings of the extracted fragments using the previously-computed semantic signatures. We then extract a dense subgraph of this representation and select the best candidate meaning for each fragment. Our experiments show state-of-the-art performances on both WSD and EL on 6 different datasets, including a multilingual setting. In the third part of the talk I will present two novel approaches to large-scale knowledge acquisition and validation developed in my lab. I will first introduce video games with a purpose (Vannella et al., 2014), a novel, powerful paradigm for the large scale acquisition and validation of knowledge and data. We demonstrate that converting games with a purpose into more traditional video games provides a fun component that motivates players to annotate for free, thereby significantly lowering annotation costs below that of crowdsourcing. Moreover, we show that video games with a purpose produce higher-quality annotations than crowdsourcing. WiBi is the largest and most accurate currently available taxonomy of Wikipedia pages and taxonomy of categories, aligned to each other. WiBi is created in three steps: we first create a taxonomy for the Wikipedia pages by parsing textual definitions, extracting the hypernym(s) and disambiguating them according to the page inventory; next, we leverage the hypernyms in the page taxonomy, together with their links to the corresponding categories, so as to induce a taxonomy over Wikipedia categories while at the same time improving the page taxonomy in an iterative way; finally we employ structural heuristics to overcome inherent problems affecting categories. The output of our three-phase approach is a bitaxonomy of millions of pages and hundreds of thousands of categories for the English Wikipedia.
机译:多语言是当今Web的关键功能,正是这一功能在我们在罗马的Sapienza大学的语言计算实验室的研究工作中加以利用和利用,在本次演讲中我将对此进行概述和展示。为了构建BabelNet网络,我们在不同阶段进行提取:从WordNet中提取所有可用的词义(作为概念)以及同义词集之间的所有词汇和语义指针(作为关系);来自Wikipedia的所有Wikipage(即Wikipage,作为概念)和其超链接中语义上未指定的关系。 WordNet和Wikipedia在概念和关系方面都重叠:这种重叠使两种资源之间的合并成为可能,从而可以创建统一的知识资源。为了支持多语言,我们收集了可用不同语言的可用概念的词汇实现。最后,我们通过建立多语言Babel同义词集之间的语义关系来连接它们。 Babelfy的工作分为三个步骤:首先,给定一个词汇化的语义网络,我们将每个顶点(即概念或命名实体),语义签名(即一组相关顶点)关联起来。这是一个预备步骤,与输入文本无关,只需执行一次。其次,给定一个文本,我们从该文本中提取所有可链接的片段,并针对每个片段,根据语义网络列出可能的含义。第三,我们通过使用先前计算的语义签名链接提取的片段的候选含义来创建整个文本的基于图的语义解释。然后,我们提取此表示的密集子图,并为每个片段选择最佳候选含义。我们的实验在6种不同的数据集(包括多语言设置)上显示了WSD和EL的最新性能。在演讲的第三部分中,我将介绍在实验室中开发的两种新颖的大规模知识获取和验证方法。我将首先介绍有目的的视频游戏(Vannella等,2014),这是一种新颖的,强大的范例,用于大规模获取和验证知识和数据。我们证明,将有目的的游戏转换为更传统的视频游戏提供了一个有趣的组件,可以激发玩家免费进行注释,从而将注释成本大大降低到众包成本之下。此外,我们证明了有目的的视频游戏比众包能产生更高质量的注释。 WiBi是Wikipedia页面上当前可用的最大,最准确的分类法,并且各个类别的分类法彼此对齐。 WiBi的创建过程分为三个步骤:首先,通过解析文本定义,提取上位词并根据页面清单对其进行歧义化,为Wikipedia页面创建分类。接下来,我们利用页面分类法中的上位词以及它们到相应类别的链接,以对Wikipedia类别进行分类,同时以迭代方式改进页面分类法。最后,我们采用结构启发法来克服影响类别的内在问题。我们的三相方法的输出是英语维基百科的成百上千的分类和数十万个类别的分类。

著录项

  • 来源
  • 会议地点 Dublin(IE)
  • 作者

    Roberto Navigli;

  • 作者单位

    Dipartimento di Informatica Sapienza Universita di Roma Viale Regina Elena, 295 - 00166 Roma Italy;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-26 14:23:23

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号