首页> 中文期刊>计算机工程与科学 >基于词向量语义聚类的微博热点挖掘方法

基于词向量语义聚类的微博热点挖掘方法

     

摘要

With the rapid development of social media,information overloading becomes a challenge.As a result,how to mining hotspots automatically from so many short and noisy data is an important problem.Social data are real-time and geographic,which usually contain plenty of meta-information.According to these characteristics,this paper proposes a hotspot mining method,which combines user's behavior patterns and text content analysis.In the process of content analysis,we cluster text on the word scale rather than message scale.Besides,sematic clustering technology of word vectors is used for promoting the performance of keywords extraction.Experimental results on real datasets show that this method is better than traditional methods.Specifically,keywords extracted by this method have strong semantic relevance and good topic segmentation,which are superior to the traditional hot-spot mining methods on the main indexes.%随着社交媒体的迅速发展,信息过载问题越发严重,因此如何从海量、短小而充满噪声的社交媒体数据中发现和挖掘出热点话题或者热点事件成为一个重要的问题.结合社交媒体数据实时性、地理性、包含较多元数据等特点,提出了用户行为分析与文本内容分析相结合的热点挖掘方法.在内容分析过程中,提出了从更细的词语粒度进行聚类,以代替传统的在消息粒度进行聚类的经典方法.为了提高话题关键词提取的效果,引入了基于词向量技术,并通过语义聚类的方法进行热点挖掘.在真实数据集上的实验结果表明,该方法提取的关键词语义关联性强、话题划分效果好,在主要指标上优于传统的热点挖掘方法.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号