【24h】

Text document clustering based on frequent concepts

机译:基于频繁概念的文本文档聚类

获取原文

摘要

This paper presents a novel technique of document clustering based on frequent concepts. The proposed FCDC (Frequent Concepts based Document Clustering), a clustering algorithm works with frequent concepts rather than frequent itemsets used in traditional text mining techniques. Many well known clustering algorithms deal with documents as bag of words while they ignore the important relationship between words like synonym relationship. The proposed algorithm utilizes the semantic relationship between words to create concepts. It exploits the WordNet ontology in turn to create low dimensional feature vector which allows developing a more accurate clustering algorithm.
机译:本文提出了一种基于频繁概念的文档聚类新技术。提出的FCDC(基于频繁概念的文档聚类)是一种聚类算法,适用于频繁概念而不是传统文本挖掘技术中使用的频繁项集。许多众所周知的聚类算法将文档视为单词袋,而忽略了单词之间的重要关系,例如同义词关系。所提出的算法利用单词之间的语义关系来创建概念。它依次利用WordNet本体来创建低维特征向量,从而可以开发更准确的聚类算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号