首页> 外文会议>Pacific Association for Computational Linguistics Conference >AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING
【24h】

AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING

机译:使用Word Clastering的自动叙述施工

获取原文

摘要

In this paper, we propose a new clustering algorithm for large scale document size to construct the thesaurus automatically in aid of summarization. The existing word-clustering systems use various similarity and clustering algorithm based on the context of the information retrieval. In case of the clustering using term-document matrix, the distribution of the index word represents the frequency of the word appearance in a certain contents of a document. Therefore, semantic relation between these words in the document is not so strong. As a result, the words which appear frequently in the contents tend to be gathered for one cluster. To construct a cluster set in which semantic relation between these words is contained, we show a word clustering using a pair of words with cooccurrence relation automatically. We further show that our clustering is effective for word sense disambiguation in comparison with using term-document matrix.
机译:在本文中,我们提出了一种新的聚类算法,用于大规模文档大小,以便自动构建叙述。现有的单词聚类系统基于信息检索的上下文使用各种相似性和聚类算法。在使用术语 - 文档矩阵的群集的情况下,索引字的分布表示文档的某个内容中的字外观的频率。因此,文档中这些单词之间的语义关系并不是那么强大。结果,频繁出现在内容中的单词往往会收集一个群集。要构建包含这些单词之间的语义关系的群集集,我们将自动使用一对单词的单词显示单词群集。我们进一步表明,与使用术语文档矩阵相比,我们的聚类对于单词感应歧义有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号