首页> 外文期刊>Tsinghua Science and Technology >Modeling Chinese microblogs with five Ws for topic hashtags extraction
【24h】

Modeling Chinese microblogs with five Ws for topic hashtags extraction

机译:用5个W建模中文微博以提取主题标签

获取原文
获取原文并翻译 | 示例
           

摘要

Hashtags are important metadata in microblogs and are used to mark topics or index messages. However, statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and analysis of these tagless microblogs. In this paper, we summarize the similarity between microblogs and shortmessage- style news, and then propose an algorithm, named 5WTAG, for detecting microblog topics based on a model of five Ws (When, Where, Who, What, hoW). As five-W attributes are the core components in event description, it is guaranteed theoretically that 5WTAG can properly extract semantic topics from microblogs. We introduce the detailed procedure of the algorithm in this paper including spam microblog identification, microblog segmentation, and candidate hashtag construction. In addition, we propose a novel recommendation computing method for ranking candidate hashtags, which combines syntax and semantic analysis and observes the distribution of artificial topic hashtags. Finally, we conduct comprehensive experiments to verify the semantic correctness and completeness of the candidate hashtags, as well as the accuracy of the recommendation method using real data from Sina Weibo.
机译:标签是微博中重要的元数据,用于标记主题或索引消息。但是,统计数据表明,大多数微博都没有主题标签。这给这些无标签微博的检索和分析提出了巨大的挑战。在本文中,我们总结了微博和短消息风格新闻之间的相似性,然后提出了一种名为5WTAG的算法,该算法基于五个W(何时,何地,谁,什么,怎么做)模型来检测微博主题。由于5 W属性是事件描述中的核心组件,因此从理论上保证5WTAG可以从微博中正确提取语义主题。我们在本文中介绍了该算法的详细过程,包括垃圾邮件微博识别,微博分段和候选哈希标签构建。此外,我们提出了一种对候选主题标签进行排名的新型推荐计算方法,该方法结合了语法和语义分析,并观察了人工主题主题标签的分布。最后,我们进行了全面的实验,以使用新浪微博的真实数据来验证候选主题标签的语义正确性和完整性以及推荐方法的准确性。

著录项

  • 来源
    《Tsinghua Science and Technology》 |2017年第2期|135-148|共14页
  • 作者单位

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

    College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Twitter; Tagging; Semantics; Computational modeling; Algorithm design and analysis; Syntactics; Sentiment analysis;

    机译:Twitter;标记;语义;计算建模;算法设计与分析;句法;情感分析;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号