首页> 外文期刊>Journal of Language Modelling >Evaluation of automatic updates of Roget’s Thesaurus
【24h】

Evaluation of automatic updates of Roget’s Thesaurus

机译:评估Roget词库的自动更新

获取原文
       

摘要

Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, automated methods are required. This work presents a tuneable method of measuring semantic relatedness, trained on Roget’s Thesaurus, which generates lists of terms related to words not yet in the Thesaurus. Using these lists of terms, we experiment with three methods of adding words to the Thesaurus. We add, with high confidence, over 5500 and 9600 new words and word senses to versions of Roget’s Thesaurus from 1911 and 1987 respectively. We evaluate our work both manually, and by applying the updated thesauri in three NLP tasks: selection of the best synonym from a set of candidates, pseudo-word-sense disambiguation, and SAT-style analogy problems. We find that the newly added words are of high quality. The additions significantly improve the performance of Roget’s-based methods in these NLP tasks. It compares favourably to the performance of WordNet-based methods. Our methods are general enough to work with different versions of Roget’s Thesaurus.
机译:叙词表和类似组织的资源吸引了自然语言处理研究人员的兴趣。叙词表的年龄很快,因此不断需要更新其词汇。由于手动更新周期要花费大量时间,因此需要自动方法。这项工作提出了一种可测量的语义相关性方法,该方法在Roget词库中进行了训练,生成了与词库中尚未使用的单词相关的术语列表。使用这些术语列表,我们尝试了三种向词库添加单词的方法。我们非常有信心地在1911年和1987年的Roget同义词库中分别添加了5500和9600个以上的新词和词义。我们既可以手动评估工作,也可以通过在三个NLP任务中应用更新的叙词表来评估我们的工作:从一组候选项中选择最佳同义词,伪词义消歧和SAT风格的类比问题。我们发现新添加的单词质量很高。这些添加大大提高了在这些NLP任务中基于Roget的方法的性能。与基于WordNet的方法的性能相比,它具有优势。我们的方法足够通用,可以与不同版本的Roget同义词库一起使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号