【24h】

The WordNet Database: Form, Function, and Use

机译:WordNet数据库:形式,功能和使用

获取原文

摘要

Wordnet is a large lexical database of the English language. Like a regular dictionary, it indexes base form words (such as the word run) to meanings (e.g., "move fast by using one's feet" as well as "a score in baseball"). Unlike a regular dictionary, it encodes significant amounts of additional information about the interrelationships of word meanings and lexical forms. Perhaps most helpfully, it marks what words are almost exactly synonymous, and so can be used as a thesaurus in addition to a dictionary. Beyond this, however, Wordnet encodes a number of other relationships, such as the fact that an animal (synonymous with animate being, creature, or fauna) is a type of organism, which is in turn a type living thing. This is called the semantic relationship of type-subtype, and Wordnet encodes semantic and lexical relationships between its entries such as type-subtype, part-whole, substance-whole, member-set, domain-topic, antonymy, derivationally related forms, among others. In addition to this rich repository of language meaning, Wordnet is further notable for its size, containing over 155,000 base wordforms, 117,000 meanings, and 188,000 relationships beyond synonymy, including over 46,000 lexical relationships and 142,000 semantic relationships. Wordnet can be of great use to any application that has to interact with natural language text. In this tutorial, we will first learn about the form of the Wordnet database: the core concepts, what kinds of relationships are encoded in the database, and some caveats about the database contents. We will also examine a small selection of tasks enabled by each type of information encoded in the database. These tasks are provided only as a sample of potential applications, as the range of uses is limited only by one's imagination. Tasks we will learn about include low-level NLP tasks such as lemmatization or root finding (given the inflected form "running" return the root "run", or given the irregular form "is" return the root "be"), all the way up to conceptual processing tasks such as determining that cats and dogs are more similar to one another than to turtles, plants, or cars. In addition to the form and utility of the database, we will learn how to interact with the database programmatically. We will first review ways of loading Wordnet into common databases such as MySQL, Sqlite, PostgresSQL, and the like, such that it can be. After this we will examine how to interface with the database directly within a Java programming language environment, focusing on the library the MIT Java Wordnet Interface (JWI)~2. JWI is small, extremely fast, easy to use, and provides API access to all available Wordnet database information.
机译:Wordnet是英语的大型词汇数据库。像普通字典一样,它会将基本形式的单词(例如run单词)索引为含义(例如“用脚快速移动”以及“棒球得分”)。与常规词典不同,它会编码大量有关单词含义和词汇形式相互关系的附加信息。也许最有帮助的是,它标记了几乎几乎是同义词的单词,因此除了字典外,还可以用作同义词库。但是,除此之外,Wordnet还对许多其他关系进行了编码,例如动物(与有生命的生物,生物或动物群同义)是一种生物体,而这又是一种生物。这称为类型子类型的语义关系,而Wordnet对其条目之间的语义和词汇关系进行编码,例如类型子类型,部分整体,实体整体,成员集,领域主题,反义词,派生相关形式,以及其他。除了丰富的语言含义库外,Wordnet的规模也更加引人注目,它包含超过155,000个基本字形,117,000个含义和188,000个同义关系,其中包括46,000个词法关系和142,000个语义关系。 Wordnet对于必须与自然语言文本进行交互的任何应用程序都非常有用。在本教程中,我们将首先了解Wordnet数据库的形式:核心概念,数据库中编码了哪些类型的关系以及有关数据库内容的一些警告。我们还将研究由数据库中编码的每种信息类型启用的少量任务。这些任务仅作为潜在应用的示例提供,因为使用范围仅受一个人的想象力的限制。我们将学习的任务包括低级NLP任务,例如lemmatization或求根(给定变体形式“ running”返回根“ run”,或者给定不规则形式“ is”返回根“ be”),解决概念性处理任务的方法,例如确定猫和狗比乌龟,植物或汽车更相似。除了数据库的形式和实用程序之外,我们还将学习如何以编程方式与数据库进行交互。我们将首先介绍将Wordnet加载到常见数据库(例如MySQL,Sqlite,PostgresSQL等)中的方式。之后,我们将研究如何在Java编程语言环境中直接与数据库接口,重点是MIT Java Wordnet接口(JWI)〜2库。 JWI体积小,速度极快,易于使用,并提供对所有可用Wordnet数据库信息的API访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号