【24h】

Finding Core Topics: Topic Extraction with Clustering on Tweet

机译:查找核心主题:主题提取在推文上使用聚类

获取原文

摘要

Twitter is one of the most popular microblogging services that lets users post short text called Tweet. Tweet is distinguished from conventional text data in that it is typically composed of short and informal message, and it makes typical text analysis methods do not work well. Accordingly, extracting meaningful topics from tweets brings up new challenges. In this work, we propose a simple and novel method called Core-Topic-based Clustering (CTC), which extracts topics and cluster tweets simultaneously based on the clustering principles: minimizing the inter-cluster similarity and maximizing the intra-cluster similarity. Experimental results show that our method efficiently extracts meaningful topics, and the clustering performance is better than K-means algorithm.
机译:Twitter是最受欢迎的微博服务之一,让用户发布名为Tweet的短文本。 推文与传统文本数据的区别,因为它通常由短和非正式消息组成,并且它使典型的文本分析方法不起作用。 因此,从推文中提取有意义的主题带来了新的挑战。 在这项工作中,我们提出了一种简单而新颖的方法,称为基于核心主题的聚类(CTC),该方法基于群集原则同时提取主题和群集推文:最小化群集间相似性并最大化群集内相似性。 实验结果表明,我们的方法有效地提取了有意义的主题,群集性能优于K均值算法。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号