首页> 外文会议>Nirma University International Conference on Engineering >Hierarchical clustering technique for word sense disambiguation using Hindi WordNet
【24h】

Hierarchical clustering technique for word sense disambiguation using Hindi WordNet

机译:使用印地语Wordnet的单词感应歧义的分层聚类技术

获取原文

摘要

Word Sense Disambiguation (WSD) is crucial and its significance is prominent in every application of computational linguistics. WSD is a challenging problem of Natural Language Processing (NLP). Though there are lots of algorithms for WSD available, still little work is carried out for choosing optimal algorithm for that. Three approaches are available for WSD, namely, Knowledge-based approach, Supervised approach and Unsupervised approach. Also, one can use the combination of given approaches. Supervised approach needs large amounts of manually created sense-annotated corpus which takes computationally more amount of time and effort. Knowledge-based approach requires machine readable dictionaries, sense inventories, thesauri, etc, which are dependent on own interpretation about word's sense; Whereas unsupervised approach uses sense-unannotated corpus and it is based on the phenomenon of working that words that co-occur have similarity. This research is for Hindi language which uses Hierarchical clustering algorithm with different similarity measures which are cosine, Jaccard and dice, the result of clusters is overlapped with Hindi WordNet a product of IIT Bombay which improves result of word sense disambiguation as clustering does grouping of words which are similar.
机译:词感消解(WSD)至关重要,其意义在计算语言学的每一个应用中都很突出。 WSD是自然语言处理(NLP)的具有挑战性问题。虽然有许多用于WSD的算法可用,但仍然很少进行用于选择最佳算法的工作。有三种方法可用于WSD,即基于知识的方法,监督方法和无监督的方法。此外,人们也可以使用给定方法的组合。监督方法需要大量手动创建的感应注释的语料库,这些语料库采用了计算性更多的时间和精力。基于知识的方法需要机器可读词典,感测库存,叙词等,这些词典依赖于对词语的自身的解释;而无监督的方法使用感觉 - 未解释的语料库,它基于工作的现象,这些方法是共同发生的单词具有相似性。该研究是用于印度语言,它使用具有余弦,Jaccard和骰子的不同相似度量的分层聚类算法,群集的结果与后者Wordnet的IIT Bombay产品重叠,这改善了词感歧义的结果,因为聚类对单词进行分组这类似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号