首页> 外文会议>International Conference on Software Engineering and Data Engineering >DOCUMENT CLUSTERING USING WORD SENSE DISAMBIGUATION
【24h】

DOCUMENT CLUSTERING USING WORD SENSE DISAMBIGUATION

机译:使用Word Sense Dismigumation的文档群集

获取原文

摘要

In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence. This paper handles text document clustering as one of the major tasks of text processing. Document clustering is the process of finding out groups of information from the text documents and cluster these documents into the most relevant groups. Large document corpus suffers from ambiguity problems like synonyms, polysemous and other semantic relations. For this reason we perform WSD task for all terms in all documents to get the best sense to be used as document features in the clustering process. Our experimental results proved that the efficiency of document clustering using WSD increases linearly with the size of the documents dataset. Different part of speech (POS) taggers were tested to determine the best; also the effect of different window sizes on WSD task was compared.
机译:在计算语言学中,词感歧义(WSD)是确定在给定句子中使用多个不同感官的词的问题。本文将文本文档群集处理为文本处理的主要任务之一。文档群集是从文本文档中查找信息组的过程,并将这些文档集中到最相关的组中。大型文档语料库患有同义词,多园和其他语义关系等歧义问题。出于这个原因,我们对所有文档中的所有术语执行WSD任务,以获得群集过程中的最佳意义。我们的实验结果证明,使用WSD的文档聚类效率随着文件数据集的大小而线性地增加。测试了不同部分的语音(POS)标记器以确定最佳;还比较了不同窗口尺寸对WSD任务的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号