首页> 外文会议>IEEE International Conference on Artificial Intelligence and Computer Applications >Cross Platform Text Mining Based on Public Emergency—Using Word2vec Model and K-means Algorithm
【24h】

Cross Platform Text Mining Based on Public Emergency—Using Word2vec Model and K-means Algorithm

机译:基于公共紧急使用Word2Vec模型和K-Means算法的跨平台文本挖掘

获取原文

摘要

In recent years, with the increasing popularity of the Internet, the number of Internet users has reached 854 million in China, and social platforms have diversified and developed rapidly. Internet users can express their opinions on social hot spots through various ways. This paper takes word2vec model and K-means algorithm as the core, crawls three kinds of texts from two platforms: Bilibili.com and Zhihu.com, and collects 28816 texts with more than 1.35 million words. Through cleaning and noise reduction, Jieba package carries out word segmentation, word frequency calculation for preliminary topic analysis. Furthermore, word2vec model is constructed, and K-means algorithm is combined with human-computer cooperation to achieve more accurate text clustering. Results the user expression differences of different platforms were compared, which can be used for public opinion evolution analysis. Accurate user feedback can promote the output of high-quality content and increase the user stickiness of the platform. The improved k-means algorithm also improves the credibility of text clustering.
机译:近年来,随着互联网的普及日益越来越多,中国互联网用户数量达到了854万,社会平台已经多样化,发展迅速。互联网用户可以通过各种方式表达对社交热点的看法。本文将Word2Vec模型和K-Means算法作为核心,从两个平台爬行三种文本:Bilibili.com和Zhihu.com,收集28816个文本,超过135万字。通过清洁和降噪,Jieba包进行字分割,单词频率计算进行初步主题分析。此外,构造了Word2Vec模型,K-Means算法与人机合作结合,以实现更准确的文本聚类。结果比较了不同平台的用户表达差异,可用于公众意见演化分析。准确的用户反馈可以促进高质量内容的输出并增加平台的用户粘性。改进的K-Means算法还提高了文本聚类的可信度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号