首页> 外文会议>Advances in Natural Language Processing >Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus
【24h】

Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

机译:使用词库和语料库的波斯同音字词义消除歧义

获取原文
获取原文并翻译 | 示例

摘要

This paper describes disambiguation of Farsi homographs in unrestricted text using thesaurus and corpus. The proposed method is based on with some differences. These differences consist of first using collocational information to avoid the collection of spurious contexts caused by polysemous words in thesaurus categories, and second contribution of all words in the test data context, even those not appeared in the collected contexts to the calculation of the conceptual classes' score. Using a Farsi corpus and a Farsi thesaurus, this method correctly disambiguated 91.46% of the instances of 15 Farsi homographs. This method was compared to three supervised corpus based methods including Naieve Bayes, Exemplar-based, and Decision List. Unlike supervised methods, this method needs no training data, and has a good performance on disambiguation of uncommon words. In addition, this method can be used for removing some kinds of morphological ambiguities.
机译:本文使用同义词库和语料库描述了波斯语同形异义词在无限制文本中的歧义。所提出的方法基于一些差异。这些差异包括:首先使用搭配信息来避免由同义词库类别中的多义词引起的虚假上下文的收集;其次,所有单词在测试数据上下文中的贡献,甚至是那些未出现在收集的上下文中的单词对概念类的计算的贡献。 ' 得分了。使用Farsi语料库和Farsi同义词库,此方法可以正确消除15个Farsi同形异义词实例中91.46%的歧义。该方法与三种基于监督语料库的方法(包括Naieve Bayes,基于示例和决策列表)进行了比较。与监督方法不同,此方法不需要训练数据,并且在消除常见单词的歧义方面表现良好。此外,该方法可用于消除某些形态上的歧义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号