首页> 外文会议>IEEE International Conference on Safety Produce Informatization >Clustering of Short Text in Micro-blog Based on K-means Algorithm
【24h】

Clustering of Short Text in Micro-blog Based on K-means Algorithm

机译:基于K均值算法的微博文本中的群集

获取原文

摘要

Based on K-means algorithm, this paper proposed a short text clustering method. First of all, data of short texts on the Internet are collected by using the web crawler. Then, they are preprocessed, for example, irrelevant contents like noisy data, punctuation and stop words, are removed. After that, word segmentation is carried out on the preprocessed short texts, and distributed expression is carried out on the segmented words. Finally, these texts are clustered and sorted on the basis of K-means algorithm. According to the experiment results, methods put forward in the paper are appropriate for short text clustering.
机译:基于K-Means算法,本文提出了一种简短的文本聚类方法。首先,使用Web爬网程序收集Internet上的短文本的数据。然后,预处理它们是预处理的,例如,删除了类似噪声数据,标点符号和停止单词的无关内容。之后,在预处理的短文本上执行单词分割,并在分段字上执行分布式表达式。最后,这些文本是基于K-Means算法的群集和分类。根据实验结果,本文提出的方法适用于短文本聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号