首页> 外文会议>IEEE International Conference on Data Mining Workshops >EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation on Twitter
【24h】

EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation on Twitter

机译:EmTaggeR:一种基于单词嵌入的Twitter推荐标签的新颖方法

获取原文

摘要

The hashtag recommendation problem addresses recommending (suggesting) one or more hashtags to explicitly tag a post made on a given social network platform, based upon the content and context of the post. In this work, we propose a novel methodology for hashtag recommendation for microblog posts, specifically Twitter. The methodology, EmTaggeR, is built upon a training-testing framework that builds on the top of the concept of word embedding. The training phase comprises of learning word vectors associated with each hashtag, and deriving a word embedding for each hashtag. We provide two training procedures, one in which each hashtag is trained with a separate word embedding model applicable in the context of that hashtag, and another in which each hashtag obtains its embedding from a global context. The testing phase constitutes computing the average word embedding of the test post, and finding the similarity of this embedding with the known embeddings of the hashtags. The tweets that contain the most-similar hashtag are extracted, and all the hashtags that appear in these tweets are ranked in terms of embedding similarity scores. The top-K hashtags that appear in this ranked list, are recommended for the given test post. Our system produces F1 score of 50.83%, improving over the LDA baseline by around 6.53 times, outperforming the best-performing system known in the literature that provides a lift of 6.42 times. EmTaggeR is a fast, scalable and lightweight system, which makes it practical to deploy in real-life applications.
机译:主题标签推荐问题解决了基于帖子的内容和上下文来推荐(建议)一个或多个主题标签以显式标记在给定社交网络平台上发布的帖子。在这项工作中,我们提出了一种新颖的方法来推荐微博帖子(尤其是Twitter)的主题标签。 EmTaggeR方法是建立在训练测试框架之上的,该框架建立在单词嵌入概念的顶部。训练阶段包括学习与每个主题标签相关的词向量,以及推导每个主题标签的词嵌入。我们提供了两种训练过程,一种是使用适用于该主题标签上下文的单独的单词嵌入模型来训练每个主题标签,另一种方法是其中每种主题标签从全局上下文中获取其嵌入。测试阶段包括计算测试帖子的平均单词嵌入,并找到该嵌入与主题标签的已知嵌入的相似性。提取包含最相似主题标签的推文,并根据嵌入相似性评分对出现在这些推文中的所有主题标签进行排名。对于给定的测试帖子,建议在此排名列表中出现的前K个主题标签。我们的系统产生的F1分数为50.83%,比LDA基线提高了约6.53倍,胜过文献中已知的最佳性能系统(提升了6.42倍)。 EmTaggeR是一个快速,可扩展且轻量级的系统,使其可以在实际应用中进行部署。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号