Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter

Kumar Sharath B. R.; Wang Kuochen; Shen Shi-Min

首页> 外文期刊>Journal of Information Recording >Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter

【24h】

Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter

机译：Twitter中基于语料库的主题派生和基于时间戳的流行标签预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the use of the Internet, mobile platforms, online commerce, and social media services, the footprints of human behavior can be easily recorded in the digital world, which generates data on an extremely large scale. Twitter as a big data social network becomes one of the most important sources for capturing up-to-date events happened in the world. Deriving topics from Twitter is important for various applications, such as situation awareness, market analysis, content filtering, and recommendations. However, topic derivation with high purity in Twitter is hard to achieve because tweets are limited to 140 characters. Previous works on topic derivation in Twitter suffer from low purity. In this paper, we propose corpus-based topic derivation (CTD) approach that combines a Twitter corpus and LF-LDA, which is a text processing model to identify topics and clusters of similar hashtags. We use asymmetric topic LF-LDA to obtain better purity of topics. Compared to intJNMF, a representative related work, the purity (F-measure) of our proposed CTD increases from 5.26% (27.81%) to 11.32% (34.28%) for 20 to 100 topics. We also propose a timestamp-based popular hashtags prediction (TPHP) approach by creating trending hashtags lists (THLs), which are lists of hashtags used by many users and make use of timestamps in tweets. We use the edit distance to find the difference between consecutive THLs. Then the difference can be used to calculate volatilety to find how people react to real world events. Compared to Hybrid+, a representative related work, the mean average precision (MAP) of our TPHP increases by 19.45% (week-day), 15.08% (week-week) and 16.95% (month-week).

机译：通过使用Internet，移动平台，在线商务和社交媒体服务，人类行为的足迹可以轻松记录在数字世界中，从而极大地生成数据。 Twitter作为大数据社交网络已成为捕获全球最新事件的最重要来源之一。从Twitter派生主题对于各种应用程序都很重要，例如情况意识，市场分析，内容过滤和建议。但是，由于推文被限制为140个字符，因此很难在Twitter中获得高纯度的主题派生。 Twitter上有关主题派生的以前的作品纯度低。在本文中，我们提出了一种基于语料库的主题派生（CTD）方法，该方法结合了Twitter语料库和LF-LDA，这是一种文本处理模型，用于识别主题和类似标签的聚类。我们使用不对称主题LF-LDA获得更好的主题纯度。与代表性的相关研究intJNMF相比，我们提出的CTD的纯度（F-度量）从206％至100个主题从5.26％（27.81％）增长至11.32％（34.28％）。我们还通过创建趋势标签列表（THL）来提出一种基于时间戳的流行标签预测（TPHP）方法，该标签是许多用户使用的标签列表，并在推文中使用了时间戳。我们使用编辑距离来查找连续THL之间的差异。然后，该差异可用于计算波动性，以发现人们对现实事件的反应。与具有代表性的相关工作Hybrid +相比，我们的TPHP的平均平均精度（MAP）提高了19.45％（工作日），15.08％（每周工作日）和16.95％（每月工作周）。

著录项

来源
《Journal of Information Recording》 |2019年第3期|675-696|共22页
作者
Kumar Sharath B. R.; Wang Kuochen; Shen Shi-Min;
展开▼
作者单位

Natl Chiao Tung Univ Dept Comp Sci Hsinchu 300 Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
corpus; popular hashtag prediction; timestamp; topic derivation; twitter;

机译：语料库热门主题标签预测;时间戳主题推导;推特;
入库时间 2022-08-18 04:36:16

相似文献

外文文献
中文文献
专利

1. FINDING NEWS-TOPIC ORIENTED INFLUENTIAL TWITTER USERS BASED ON TOPIC RELATED HASHTAG COMMUNITY DETECTION [J] . Xiao Feng, Noro Tomoya, Tokuda Takehiro Journal of web engineering . 2014,第5a6期

机译：基于主题相关的Hashtag社区检测来查找新闻主题的影响性Twitter用户
2. Understanding Twitter Hashtags from Latent Themes Using Biterm Topic Model [J] . Muzafar R. Bhat, Burhan Bashir, Majid A. Kundroo, Recent Patents on Engineering . 2020,第3期

机译：了解使用BITERM主题模型从潜在主题的Twitter Hashtags
3. Semantic knowledge LDA with topic vector for recommending hashtags: Twitter use case [J] . Tajbakhsh Mir Saman, Bagherzadeh Jamshid Intelligent data analysis . 2019,第3期

机译：带有主题向量的语义知识LDA，用于推荐主题标签：Twitter用例
4. Initial indicators of topic success in Twitter: Using topology entropy to predict the success of Twitter hashtags [C] . Planck Max, Pollard Isis Lyman, Brock Charles, 2013 IEEE 2nd Network Science Workshop . 2013

机译：Twitter主题成功的初步指标：使用拓扑熵预测Twitter标签的成功
5. Machine Learning for Topic Classification and Prediction on Twitter [D] . Safari, Kasra . 2019

机译：机器学习主题分类和推特预测
6. Hashtags in healthcare: understanding Twitter hashtags and online engagement at the American Association for the Surgery of Trauma 2016–2019 meetings [O] . Kristen Santarone, Dessy Boneva, Mark McKenney, 2020

机译：Healthcare中的Hashtags：了解Twitter Hashtags和在线参与美国的Trauma手术协会2016-2019会议
7. Coastal and marine topics and destinations during the COVID-19 pandemic in Twitter's tourism hashtags [O] . Orly Carvache-Franco, Mauricio Carvache-Franco, Wilmer Carvache-Franco 2021

机译：Covid-19在Twitter旅游哈希特拉格的Covid-19大流行期间的沿海和海洋主题和目的地

Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter

摘要

著录项

相似文献

相关主题

期刊订阅