首页> 外文会议>International Conference on Audio, Language and Image Processing >Extracting topic keywords from Sina Weibo text sets
【24h】

Extracting topic keywords from Sina Weibo text sets

机译:从新浪微博文本集中提取主题关键字

获取原文

摘要

Sina Weibo is one of the most popular microblogging website in China. It has more than 500 million registered users and the daily production of posters is over 100 million, with a market penetration similar to Twitter. Mining the useful information from large volume of fragmented short texts is a fundamental but very challenging research work. This paper proposes a method LET(LDA&Entropy&Tex-trank) to extract topic keywords from Sina Weibo topics text sets. LET considers both topic influence of keywords and topic discrimination of keyword that combines the merits of LDA, Entropy and TextRank. In addition, we design a new standard evaluation method KESS (topic KEywords Sta-ndard Sequence). Based on KESS, we can compute the offset loss scores for the four different keywords extraction methods. Extensive simulations show that LET is a comparatively efficient and effective method to obtain topic words from hot topics of Sina Weibo.
机译:新浪微博是中国最受欢迎的微博网站之一。它拥有超过5亿的注册用户,海报的日产量超过1亿,市场渗透率与Twitter相似。从大量零散的短文本中挖掘有用的信息是一项基础性但非常具有挑战性的研究工作。本文提出了一种LET(LDA&Entropy&Tex-trank)方法,用于从新浪微博主题文本集中提取主题关键词。 LET结合了LDA,Entropy和TextRank的优点,同时考虑了关键字的主题影响和关键字的主题区分。此外,我们设计了一种新的标准评估方法KESS(主题KEywords Sta-ndard序列)。基于KESS,我们可以为四种不同的关键字提取方法计算偏移损失得分。大量的仿真表明,LET是从新浪微博的热门话题中获取话题词的一种相对有效的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号