【24h】

AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING

机译:使用词簇自动构建词库

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a new clustering algorithm for large scale document size to construct the thesaurus automatically in aid of summarization. The existing word-clustering systems use various similarity and clustering algorithm based on the context of the information retrieval. In case of the clustering using term-document matrix, the distribution of the index word represents the frequency of the word appearance in a certain contents of a document. Therefore, semantic relation between these words in the document is not so strong. As a result, the words which appear frequently in the contents tend to be gathered for one cluster. To construct a cluster set in which semantic relation between these words is contained, we show a word clustering using a pair of words with cooccurrence relation automatically. We further show that our clustering is effective for word sense disambiguation in comparison with using term-document matrix.
机译:在本文中,我们提出了一种针对大规模文档大小的新聚类算法,以借助摘要自动构建同义词库。现有的词聚类系统基于信息检索的上下文使用各种相似性和聚类算法。在使用术语文档矩阵进行聚类的情况下,索引词的分布表示单词在文档某些内容中出现的频率。因此,文档中这些词之间的语义关系不是那么牢固。结果,在内容中频繁出现的单词趋向于聚集在一簇中。为了构建其中包含这些词之间的语义关系的聚类集,我们展示了使用具有共现关系的一对词自动进行词聚类的过程。我们进一步表明,与使用术语文档矩阵相比,我们的聚类方法可有效消除词义歧义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号