首页> 外文会议>IEEE International Conference on Data Mining Workshops >Tovel: Distributed Graph Clustering for Word Sense Disambiguation
【24h】

Tovel: Distributed Graph Clustering for Word Sense Disambiguation

机译:Tovel:消除词义歧义的分布式图聚类

获取原文

摘要

Word sense disambiguation is a fundamental problem in natural language processing (NLP). In this problem, a large corpus of documents contains mentions to well-known (non-ambiguous) words, together with mentions to ambiguous ones. The goal is to compute a clustering of the corpus, such that documents that refer to the same meaning appear in the same cluster, subsequentially, each cluster is assigned to a different semantic meaning. In this paper, we propose a mechanism for word sense disambiguation based on distributed graph clustering that is incremental in nature and can scale to big data. A novel, heuristic vertex-centric algorithm based on the metaphor of the water cycle is used to cluster the graph. Our approach is evaluated on real datasets in both centralized and decentralized environments.
机译:单词歧义消除是自然语言处理(NLP)中的一个基本问题。在这个问题中,大量的文档集包含对知名(非歧义)词的提及以及对歧义词的提及。目的是计算语料库的聚类,以使引用相同含义的文档出现在同一聚类中,随后,每个聚类被赋予不同的语义。在本文中,我们提出了一种基于分布式图聚类的词义消歧机制,该机制本质上是增量的并且可以扩展到大数据。基于水循环隐喻的一种新颖的,启发式的,以顶点为中心的算法被用来对图进行聚类。我们的方法是在集中式和分散式环境中的真实数据集上进行评估的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号