首页> 外文会议>IEEE International Conference of Safety Produce Informatization >Clustering of Short Text in Micro-blog Based on K-means Algorithm
【24h】

Clustering of Short Text in Micro-blog Based on K-means Algorithm

机译:基于K-means算法的微博短文本聚类

获取原文

摘要

Based on K-means algorithm, this paper proposed a short text clustering method. First of all, data of short texts on the Internet are collected by using the web crawler. Then, they are preprocessed, for example, irrelevant contents like noisy data, punctuation and stop words, are removed. After that, word segmentation is carried out on the preprocessed short texts, and distributed expression is carried out on the segmented words. Finally, these texts are clustered and sorted on the basis of K-means algorithm. According to the experiment results, methods put forward in the paper are appropriate for short text clustering.
机译:基于K-means算法,提出了一种短文本聚类方法。首先,使用Web搜寻器收集Internet上的短文本数据。然后,对它们进行预处理,例如,删除不相关的内容,例如嘈杂的数据,标点符号和停用词。之后,对预处理后的短文本进行分词,并对分段后的词进行分布式表达。最后,这些文本基于K-means算法进行聚类和排序。根据实验结果,本文提出的方法适用于短文本聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号