首页> 外文会议>International conference on natural language processing >Using Word Embeddings for Bilingual Unsupervised WSD
【24h】

Using Word Embeddings for Bilingual Unsupervised WSD

机译:将单词嵌入用于双语无监督WSD

获取原文

摘要

Unsupervised Word Sense Disambiguation (WSD) is one of the challenging problems in natural language processing. Recently, an unsupervised bilingual WSD approach has been proposed. This approach uses context aware EM formulation for estimating the sense distribution by using the co-occurrence counts of cross-linked words in comparable corpora. WordNet-based similarity measures are used for approximating the co-occurrence counts. In this paper, we explore the feasibility of the use of Word Embeddings for approximating these counts, which is an extension to the existing approach. We evaluated our approach for Hindi-Marathi language pair, on Health domain. On using the combination of Word Embeddings and WordNet-based similarity measures, we observed 8.5% and 2.5% improvement in the F-score of verbs and adjectives respectively for Marathi and 7% improvement in the F-score of adjectives for Hindi. The experiments show that the combination of Word Embeddings and WordNet-based similarity measures is a reasonable approximation for the bilingual WSD.
机译:无监督词义歧义消除(WSD)是自然语言处理中的难题之一。近来,已经提出了一种无监督的双语WSD方法。此方法使用上下文感知的EM公式,通过使用可比语料库中交联单词的共现计数来估计有义分布。基于WordNet的相似性度量用于近似共现计数。在本文中,我们探索了使用词嵌入来近似计算这些计数的可行性,这是对现有方法的扩展。我们在健康领域评估了针对印地语-马拉地语对的方法。通过结合使用词嵌入和基于WordNet的相似性度量,我们观察到动词和形容词的F分数对马拉地语的改善分别为8.5%和2.5%,印地语的形容词的F分数分别为7%和7%的改善。实验表明,单词嵌入和基于WordNet的相似性度量的组合是双语WSD的合理近似值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号