首页> 外文会议>International Conference on Big Data Computing and Communications >Determining the Topic Hashtags for Chinese Microblogs Based on 5W Model
【24h】

Determining the Topic Hashtags for Chinese Microblogs Based on 5W Model

机译:基于5W模型确定汉微博主题标签主题

获取原文
获取外文期刊封面目录资料

摘要

A hashtag is an important metadata in microblogs and used to mark topics or index messages. With topic-related hashatags microblogs are well grouped, and users can retrieve the microblogs efficiently and then follow the interested conversations. At the same time, microblogging service providers can leverage hashtags to classify the massive microblogs for building high-level applications such as event detection and tracking, sentiment analysis, and opinion mining. However, statistics show that hashtags are absent from most of the microblogs. In this paper, we summarize the similarities between microblogs and short-message-style news, and then propose an algorithm named 5WTAG for detecting microblog topics based on the model of five Ws(When, Where, Who, What, hoW). Since five-W(5W) attributes are the core components in event description, it is guaranteed theoretically that 5WTAG can extract the semantical topic from a microblogs properly. We introduce the detailed procedure of the algorithm 5WTAG in this paper including microblog segmentation and candidate hashtag construction. We propose a novel method of recommendation computing for ranking candidate hashtags, which combines syntax analysis and semantic analysis, and observes the distribution law of human-annotated topic tags. We conduct comprehensive experiments to verify the semantical correctness and completeness of the candidate hashtags as well as the accuracy of recommendation using the real data from Sina Weibo.
机译:HashTAG是微博中的重要元数据,并用于标记主题或索引消息。与主题相关的hashatags微博进行分组,用户可以有效地检索微博,然后遵循感兴趣的对话。同时,微博服务提供商可以利用Hashtags来分类大量微博,用于构建高级应用,如事件检测和跟踪,情感分析和意见采矿。但是,统计数据显示缺席的大多数微博。在本文中,我们总结了微博和短信式新闻之间的相似之处,然后提出了一种名为5WTAG的算法,用于基于五个WS的模型来检测微博主题(当,何时,谁,何种方式)。由于五W(5W)属性是事件描述中的核心组件,因此理论上,它可以在理论上保证5WTAG可以正确地从微博中提取语义主题。我们在本文中介绍了算法5WTAG的详细程序,包括微博分割和候选散列结构。我们提出了一种新的推荐计算方法,用于排名候选标签,其结合了语法分析和语义分析,并观察了人类注释主题标签的分配定律。我们开展全面的实验,以验证候选物标签的语义正确性和完整性,以及使用来自新浪微博的真实数据的建议的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号