首页> 外文期刊>Computer Science & Information Technology >Improvement WSD Dictionary Using Annotated Corpus and Testing it with Simplified Lesk Algorithm
【24h】

Improvement WSD Dictionary Using Annotated Corpus and Testing it with Simplified Lesk Algorithm

机译:使用带注释的语料库改进WSD词典并使用简化的Lesk算法对其进行测试

获取原文
           

摘要

WSD is a task with a long history in computational linguistics. It is open problem in NLP. Thisresearch focuses on increasing the accuracy of Lesk algorithm with assistant of annotatedcorpus using Narodowy Korpus Jezyka Polskiego (NKJP “Polish National Corpus”). TheNKJP_WSI (NKJP Word Sense Inventory) is used as senses inventory. A Lesk algorithm isfirstly implemented on the whole corpus (training and test) and then getting the results. This isdone with assistance of special dictionary that contains all possible senses for each ambiguousword. In this implementation, the similarity equation is applied to information retrieval using tfidfwith small modification in order to achieve the requirements. Experimental results show thatthe accuracy of 82.016% and 84.063% without and with deleting stop words respectively.Moreover, this paper practically solves the challenge of an execution time. Therefore, weproposed special structure for building another dictionary from the corpus in order to reducetime complicity of the training process. The new dictionary contains all the possible words (onlythese which help us in solving WSD) with their tf-idf from the existing dictionary with assistantof annotated corpus. Furthermore, eexperimental results show that the two tests are identical.The execution time - of the second test dropped down to 20 times compared to first test withsame accuracy.
机译:WSD是一项在计算语言学上具有悠久历史的任务。这是NLP中的开放问题。这项研究的重点是使用Narodowy Korpus Jezyka Polskiego(NKJP“波兰国家语料库”)在带注释的语料库的辅助下提高Lesk算法的准确性。 NKJP_WSI(NKJP单词感测清单)用作感测清单。首先对整个语料库(训练和测试)实施Lesk算法,然后得到结果。这是在特殊词典的帮助下完成的,该词典包含每个歧义词的所有可能含义。在该实现中,将相似性方程应用于使用tfidf进行少量修改的信息检索,以达到要求。实验结果表明,不删除停用词和删除停用词的准确率分别为82.016%和84.063%。此外,本文实际解决了执行时间的难题。因此,我们提出了一种特殊的结构,用于从语料库中构建另一本词典,以减少培训过程的时间复杂性。新词典包含现有词典中带有注释语料库的助手的tf-idf,其中包含所有可能的单词(仅这些单词有助于我们解决WSD)。此外,实验结果表明这两个测试是相同的。与相同精度的第一个测试相比,第二个测试的执行时间下降到了20倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号