首页> 外文会议>IEEE International Conference on Data Mining Workshops >Tovel: Distributed Graph Clustering for Word Sense Disambiguation
【24h】

Tovel: Distributed Graph Clustering for Word Sense Disambiguation

机译:尖端:分布式图形聚类,用于单词感歧义

获取原文

摘要

Word sense disambiguation is a fundamental problem in natural language processing (NLP). In this problem, a large corpus of documents contains mentions to well-known (non-ambiguous) words, together with mentions to ambiguous ones. The goal is to compute a clustering of the corpus, such that documents that refer to the same meaning appear in the same cluster, subsequentially, each cluster is assigned to a different semantic meaning. In this paper, we propose a mechanism for word sense disambiguation based on distributed graph clustering that is incremental in nature and can scale to big data. A novel, heuristic vertex-centric algorithm based on the metaphor of the water cycle is used to cluster the graph. Our approach is evaluated on real datasets in both centralized and decentralized environments.
机译:词语歧义是自然语言处理(NLP)中的基本问题。在这个问题中,大型文档语料库包括向众所周知(非暧昧)单词的提到,以及提到模糊的文字。目标是计算语料库的群集,使得引用相同含义的文档显示在同一群集中,随后,每个群集分配给不同的语义含义。在本文中,我们提出了一种基于自然增量的分布式图形聚类的词语感消除歧义的机制,并且可以扩展到大数据。基于水循环隐喻的新型启发式顶点为中心算法用于聚类图形。我们的方法是在集中式和分散环境中的实际数据集中进行评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号