is particularly useful. It is of high quality and has been in development for over '/> Automatic Supervised Thesauri Construction with 'Roget's Thesaurus'.
首页> 外文学位 >Automatic Supervised Thesauri Construction with 'Roget's Thesaurus'.
【24h】

Automatic Supervised Thesauri Construction with 'Roget's Thesaurus'.

机译:具有“ Roget词库”的自动监督词库构建。

获取原文
获取原文并翻译 | 示例

摘要

Thesauri are important tools for many Natural Language Processing applications. Roget's Thesaurus is particularly useful. It is of high quality and has been in development for over a century and a half. Yet its applications have been limited, largely because the only publicly available edition dates from 1911. This thesis proposes and tests methods of automatically updating the vocabulary of the 1911 Roget's Thesaurus..;I use the Thesaurus as a source of training data in order to learn from Roget's for the purpose of updating Roget's. The lexicon is updated in two stages. First, I develop a measure of semantic relatedness that enhances existing distributional techniques. I improve existing methods by using known sets of synonyms from Roget's to train a distributional measure to better identify near synonyms. Second, I use the new measure of semantic relatedness to find where in Roget's to place a new word. Existing words from Roget's are used as training data to tune the parameters of three methods of inserting words. Over 5000 new words and word-senses were added using this process.;I conduct two kinds of evaluation on the updated Thesaurus. One is on the procedure for updating Roget's. This is accomplished by removing some words from the Thesaurus and testing my system's ability to reinsert them in the correct location. Human evaluation of the newly added words is also performed. Annotators must determine whether a newly added word is in the correct location. They found that in most cases the new words were almost indistinguishable from those already existing in Roget's Thesaurus..;The second kind of evaluation is to establish the usefulness of the updated Roget's Thesaurus on actual Natural Language Processing applications. These applications include determining semantic relatedness between word pairs or sentence pairs, identifying the best synonym from a set of candidates, solving SAT-style analogy problems, pseudo-word-sense disambiguation, and sentence ranking for text summarization. The updated Thesaurus consistently performed at least as well or better the original Thesaurus on all these applications.
机译:叙词表是许多自然语言处理应用程序的重要工具。 Roget词库特别有用。它是高质量的,并且已经发展了一个半多世纪。然而,它的应用受到了限制,主要是因为唯一的公开版本可追溯到1911年。本文提出并测试了自动更新1911年 Roget's同义词库的词汇的方法。。训练数据的来源,以便从 Roget's 学习,以更新 Roget's 。词典分为两个阶段进行更新。首先,我开发了一种语义相关性度量,以增强现有的分发技术。我使用来自 Roget's 的已知同义词集来改进分布方法,以更好地识别附近的同义词,从而改进了现有方法。其次,我使用新的语义相关性度量来查找 Roget's 中的哪个位置放置新单词。来自 Roget's 的现有单词用作训练数据,以调整三种插入单词的方法的参数。使用此过程添加了5000多个新单词和单词感。我对更新的同义词库进行了两种评估。一种是在更新 Roget's的过程中。通过从 Thesaurus 中删除​​一些单词并测试我的系统将它们重新插入正确位置的能力来实现。还对新添加的单词进行人工评估。注释者必须确定新添加的单词是否在正确的位置。他们发现,在大多数情况下,新单词与 Roget词库中已经存在的单词几乎没有区别。;第二种评估是确定更新的 Roget词库在实际的自然语言处理应用程序上。这些应用程序包括确定单词对或句子对之间的语义相关性,从一组候选者中识别最佳同义词,解决SAT风格的类比问题,伪单词感知歧义消除和句子排序以进行文本摘要。在所有这些应用程序上,更新后的同义词库始终表现至少与原始 Thesaurus 相同或更好。

著录项

  • 作者

    Kennedy, Alistair.;

  • 作者单位

    University of Ottawa (Canada).;

  • 授予单位 University of Ottawa (Canada).;
  • 学科 Information Science.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 229 p.
  • 总页数 229
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:43:00

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号