首页> 外文会议>Nirma University International Conference on Engineering >Hierarchical clustering technique for word sense disambiguation using Hindi WordNet
【24h】

Hierarchical clustering technique for word sense disambiguation using Hindi WordNet

机译:使用印地语WordNet的词义消歧层次聚类技术

获取原文

摘要

Word Sense Disambiguation (WSD) is crucial and its significance is prominent in every application of computational linguistics. WSD is a challenging problem of Natural Language Processing (NLP). Though there are lots of algorithms for WSD available, still little work is carried out for choosing optimal algorithm for that. Three approaches are available for WSD, namely, Knowledge-based approach, Supervised approach and Unsupervised approach. Also, one can use the combination of given approaches. Supervised approach needs large amounts of manually created sense-annotated corpus which takes computationally more amount of time and effort. Knowledge-based approach requires machine readable dictionaries, sense inventories, thesauri, etc, which are dependent on own interpretation about word's sense; Whereas unsupervised approach uses sense-unannotated corpus and it is based on the phenomenon of working that words that co-occur have similarity. This research is for Hindi language which uses Hierarchical clustering algorithm with different similarity measures which are cosine, Jaccard and dice, the result of clusters is overlapped with Hindi WordNet a product of IIT Bombay which improves result of word sense disambiguation as clustering does grouping of words which are similar.
机译:词义歧义消除(WSD)至关重要,其重要性在计算语言学的每种应用中都非常突出。 WSD是自然语言处理(NLP)的一个具有挑战性的问题。尽管有很多用于WSD的算法,但为此选择最佳算法的工作却很少。 WSD可以使用三种方法,即基于知识的方法,受监督的方法和无监督的方法。同样,可以使用给定方法的组合。有监督的方法需要大量的手动创建的带有感官注释的语料库,这需要花费更多的时间和精力。基于知识的方法需要机器可读的字典,感觉清单,叙词表等,它们依赖于自己对单词意义的解释;而无监督方法则使用无意义的语料库,并且基于工作现象,共同出现的词具有相似性。这项研究针对的是印地语,它使用具有相似度的余弦,雅卡德和骰子等不同度量的层次聚类算法,聚类的结果与IIT Bombay的印地语WordNet重叠,因为聚类可以对词进行分组,从而改善了词义消歧的结果这是相似的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号