【24h】

DOCUMENT CLUSTERING USING WORD SENSE DISAMBIGUATION

机译:使用词义歧义进行文档聚类

获取原文
获取原文并翻译 | 示例

摘要

In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence .rnThis paper handles text document clustering as one of the major tasks of text processing. Document clustering is the process of finding out groups of information from the text documents and cluster these documents into the most relevant groups. Large document corpus suffers from ambiguity problems like synonyms, polysemous and other semantic relations. For this reason we perform WSD task for all terms in all documents to get the best sense to be used as document features in the clustering process.rnOur experimental results proved that the efficiency of document clustering using WSD increases linearly with the size of the documents dataset. Different part of speech (POS) taggers were tested to determine the best; also the effect of different window sizes on WSD task was compared.
机译:在计算语言学中,词义歧义消除(WSD)是确定在给定句子中使用具有多种词义的单词的问题。本文将文本文档聚类作为文本处理的主要任务之一进行处理。文档聚类是从文本文档中找出信息组并将这些文档聚类为最相关的组的过程。大型文档语料库存在歧义问题,例如同义词,多义性和其他语义关系。因此,我们对所有文档中的所有术语执行WSD任务,以获得在聚类过程中用作文档特征的最佳含义。我们的实验结果证明,使用WSD进行文档聚类的效率随文档数据集的大小线性增加。测试了不同的词性(POS)标记器,以确定最佳词性。还比较了不同窗口大小对WSD任务的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号