首页> 外文学位 >Broad-coverage hierarchical word sense disambiguation.
【24h】

Broad-coverage hierarchical word sense disambiguation.

机译:广泛的分层词义消除。

获取原文
获取原文并翻译 | 示例

摘要

In naturally occurring language, hearers and readers are faced with large numbers of "ambiguous" words, i.e., words with multiple senses, and "unknown" words, i.e., words they are encountering for the first time and could not be in their lexicon. Ambiguous and unknown words seem to cause little difficulty for humans, who infer their syntactic and semantic properties on the fly to resolve ambiguities, and incorporate unknown words into their lexicons. Ambiguous and unknown words also pose problems for dictionary-based approaches in natural language processing applications. To use the information contained in the dictionary it is necessary to associate each word in the text that is being processed with one of the senses or concepts defined in the dictionary. If the word is ambiguous, it is necessary to identify the intended sense among the possible senses of the word, if it is unknown it is necessary to assign the word to one among all possible senses defined by the dictionary. The acquisition of unknown words can be seen as a disambiguation task in which the possible senses are all senses listed in the dictionary. In this thesis we formulate a single unified approach for learning unknown words, and performing word sense disambiguation. We focus on nouns but our method can be generalized to verbs and other syntactic categories. We propose a broad-coverage method which can be applied to any kind of text. We frame this problem as a pattern classification task. Each ambiguous or unknown word is classified as belonging to one of the existing concepts on the basis of morphological, syntactic and semantic properties of the contexts in which it appears. Our system takes as input an existing dictionary, which defines a hierarchy of concepts, and a corpus of textual data, and disambiguates all nouns in the corpus. We demonstrate this by disambiguating all nouns in a 40 million words collection of newspaper articles. We present empirical results from experiments carried out also with novel multi-level classification techniques, which exploit generalizations that hold at different levels of the concept hierarchy.
机译:在自然出现的语言中,听者和阅读者面临着大量的“含糊不清”的单词,即具有多种意义的单词,以及“未知”的单词,即他们第一次遇到且不能出现在词典中的单词。歧义和未知的单词似乎对人类几乎没有什么困难,他们可以即时推断其句法和语义特性以解决歧义,并将未知单词纳入其词典。在自然语言处理应用程序中,歧义词和未知词也给基于字典的方法带来了问题。为了使用词典中包含的信息,有必要将正在处理的文本中的每个单词与词典中定义的一种意义或概念相关联。如果单词是不明确的,则有必要在单词的可能含义中识别预期的含义,如果未知,则需要将单词分配给词典定义的所有可能含义中的一个。未知单词的获取可以看作是消除歧义的任务,其中可能的感觉都是词典中列出的所有感觉。在本文中,我们提出了一种统一的方法来学习未知单词,并执行单词义消歧。我们专注于名词,但是我们的方法可以推广到动词和其他句法类别。我们提出了一种广泛适用的方法,可以应用于任何类型的文本。我们将此问题归结为模式分类任务。每个歧义或未知单词根据其出现的上下文的形态,句法和语义特性被分类为属于现有概念之一。我们的系统以现有字典为输入,该字典定义概念的层次结构和文本数据的语料库,并消除语料库中的所有名词的歧义。我们通过消除4000万个单词的报纸文章中所有名词的歧义来证明这一点。我们介绍了使用新颖的多级分类技术进行的实验得出的经验结果,这些技术利用了在概念层次结构不同层次上得到的概括。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号