【24h】

A New Text Clustering Method Using Hidden Markov Model

机译:使用隐马尔可夫模型的新文本群集方法

获取原文
获取外文期刊封面目录资料

摘要

Being high-dimensional and relevant in semantics, text clustering is still an important topic in data mining. However, little work has been done to investigate attributes of clustering process, and previous studies just focused on characteristics of text itself. As a dynamic and sequential process, we aim to describe text clustering as state transitions for words or documents. Taking K-means clustering method as example, we try to parse the clustering process into several sequences. Based on research of sequential and temporal data clustering, we propose a new text clustering method using HMM(Hidden Markov Model). And through the experiments on Reuters-21578, the results show that this approach provides an accurate clustering partition, and achieves better performance rates compared with K-means algorithm.
机译:在语义中是高维和相关的,文本聚类仍然是数据挖掘中的一个重要主题。但是,已经完成了很少的工作来调查聚类过程的属性,之前的研究刚刚专注于文本本身的特征。作为动态和顺序过程,我们的目标是将文本群集描述为单词或文档的状态转换。以K-means聚类方法为例,我们尝试将聚类过程解析为几个序列。基于顺序和时间数据聚类的研究,我们提出了一种使用HMM(隐马尔可夫模型)的新文本聚类方法。通过对Reuters-21578的实验,结果表明,该方法提供了准确的聚类分区,并与K-Means算法相比实现了更好的性能率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号