首页> 外文会议>Language and technology conference >Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion
【24h】

Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion

机译:利用基于维基百科的信息丰富分类法来提取与ConceptNet扩展相关的位置,创建者和成员资格相关信息

获取原文

摘要

In this paper we present a method for extracting IsA assertions (hyponymy relations), AtLocation assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations), CreatedBy assertions (informing of the creator of an object) and MemberOf assertions (informing of group membership) automatically from Japanese Wikipedia XML dump files. We use the Hyponymy extraction tool v1.0, which analyses definition, category and hierarchy structures of Wikipedia articles to extract IsA assertions and produce information-rich taxonomy. From this taxonomy we extract additional information, in this case AtLocation, LocatedNear, CreatedBy and MemberOf types of assertions, using our original method. The presented experiments prove that both methods produce satisfactory results: we were able to acquire 5,866,680 IsA assertions with 96.0% reliability, 131,760 AtLocation assertion pairs with 93.5% reliability, 6,217 LocatedNear assertion pairs with 98.5% reliability, 270,230 CreatedBy assertion pairs with 78.5% reliability and 21,053 MemberOf assertions with 87.0% reliability. Our method surpassed the baseline system in terms of both precision and the number of acquired assertions.
机译:在本文中,我们提出了一种提取IsA断言(同名关系),AtLocation断言(通知对象或位置的位置),LocatedNear断言(通知邻近位置),CreatedBy断言(通知对象的创建者)的方法。和Wikipedia XML转储文件中的MemberOf断言(通知组成员身份)。我们使用“ Hyponymy”提取工具v1.0,该工具分析Wikipedia文章的定义,类别和层次结构,以提取IsA断言并生成信息丰富的分类法。从此分类法中,我们使用原始方法提取其他信息,在本例中为AtLocation,LocatedNear,CreatedBy和MemberOf类型的断言。实验表明,这两种方法均取得了令人满意的结果:我们能够获得5866,680个IsA声明,具有96.0%的可靠性,131,760个AtLocation声明对,具有93.5%的可靠性,6,217个LocatedNear声明对,具有98.5%的可靠性,270,230个CreatedBy声明对,具有78.5%的可靠性。 21,053个MemberOf断言具有87.0%的可靠性。我们的方法在精度和所获取断言的数量方面都超过了基线系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号