首页> 外文OA文献 >Statistical data mining for Sina Weibo, a Chinese micro-blog: sentiment modelling and randomness reduction for topic modelling

【2h】

Statistical data mining for Sina Weibo, a Chinese micro-blog: sentiment modelling and randomness reduction for topic modelling

机译：中国微博新浪微博的统计数据挖掘：主题建模的情绪建模和随机性降低

页面导航

摘要
著录项
相似文献
相关主题

摘要

Before the arrival of modern information and communication technology, it was not easy to capture people’s thoughts and sentiments; however, the development of statistical data mining techniques and the prevalence of mass social media provide opportunities to capture those trends. Among all types of social media, micro-blogs make use of the word limit of 140 characters to force users to get straight to thepoint, thus making the posts brief but content-rich resources for investigation. The data mining object of this thesis is Weibo, the most popular Chinese micro-blog.udIn the first part of the thesis, we attempt to perform various exploratory data mining on Weibo. After the literature review of micro-blogs, the initial steps of data collection and data pre-processing are introduced. This is followed by analysis of the time of the posts, analysis between intensity of the post and share price, term frequency and cluster analysis.udSecondly, we conduct time series modelling on the sentiment of Weibo posts. Considering the properties of Weibo sentiment, we mainly adopt the framework of ARMA mean with GARCH type conditional variance to fit the patterns. Other distinct models are also considered for negative sentiment for its complexity. Model selection and validation are introduced to verify the fitted models.udThirdly, Latent Dirichlet Allocation (LDA) is explained in depth as a way to discover topics from large sets of textual data. The major contribution is creating a Randomness Reduction Algorithm applied to post-process the output of topic models, filtering out the insignificant topics and utilising topic distributions to find out the most persistent topics. At the end of this chapter, evidence of theudeffectiveness of the Randomness Reduction is presented from empirical studies. The topic classification and evolution is also unveiled.

机译：在现代信息和通信技术出现之前，要捕捉人们的思想和情感并不容易。然而，统计数据挖掘技术的发展和大众社交媒体的普及为抓住这些趋势提供了机会。在所有类型的社交媒体中，微博客利用140个字符的字数限制来迫使用户直截了当，从而使帖子简短但内容丰富，可供调查。本文的数据挖掘对象是中国最受欢迎的微博微博。 ud在本文的第一部分，我们尝试对微博进行各种探索性数据挖掘。在对微博客进行文献回顾之后，介绍了数据收集和数据预处理的初始步骤。其次是发帖时间的分析，发帖强度与股价之间的关系分析，期限频率和聚类分析。 ud其次，我们对微博发帖的情绪进行时间序列建模。考虑到微博情绪的属性，我们主要采用带有GARCH类型条件方差的ARMA均值框架来拟合模式。其他复杂的模型也被认为具有负面情绪。引入了模型选择和验证来验证拟合的模型。 ud，第三，对潜在的狄利克雷分配（LDA）进行了深入解释，作为从大量文本数据中发现主题的一种方法。主要贡献在于创建了一种随机性降低算法，该算法可用于对主题模型的输出进行后处理，过滤掉无关紧要的主题并利用主题分布来找出最持久的主题。在本章的最后，通过经验研究提供了减少随机性的有效性的证据。主题分类和演变也将揭晓。

著录项

作者
Cheng Wenqian;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. A short-term trend prediction model of topic over Sina Weibo dataset [J] . Juanjuan Zhao, Weili Wu, Xiaolong Zhang, Journal of combinatorial optimization . 2014,第3期

机译：新浪微博数据集主题的短期趋势预测模型
2. Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models [J] . Hong Haoyuan, Pourghasemi Hamid Reza, Pourtaghi Zohre Sadat Geomorphology . 2016,第Apra15期

机译：中国联华县滑坡敏感性评估：随机森林数据挖掘技术与双变量和多变量统计模型的比较
3. CRATS: An LDA-Based Model for Jointly Mining Latent Communities, Regions, Activities, Topics, and Sentiments from Geosocial Network Data [J] . Jia-Dong Zhang, Chi-Yin Chow IEEE Transactions on Knowledge and Data Engineering . 2016,第11期

机译：CRATS：基于LDA的模型，用于从地社会网络数据中联合挖掘潜在社区，区域，活动，主题和情感
4. STATISTICALLY MODELLING AND MINING REMOTELY SENSED DATA IN URBAN AREAS BASED ON TOPIC MODELS - A CONCEPTUAL ANALYSIS [C] . Liwei LI, Bing ZHANG, Junsheng LI Workshop on Hyperspectral Image and Signal Processing . 2016

机译：基于主题模型的城市地区统计建模与开采远程感应数据 - 概念分析
5. Building Dynamic Ontological Models for Place Names Using Social Media Data from Twitter and Sina Weibo [D] . Zhang, Qingyun. 2017

机译：使用来自Twitter和新浪微博的社交媒体数据建立地名的动态本体模型
6. Modeling of Causes of Sina Weibo Continuance Intention with Mediation of Gender Effects [O] . Lingyu Wang, Wenguo Zhao, Xianghong Sun, -1

机译：性别效应介导的新浪微博持续意图成因建模
7. Analysis model of the most important factors in Covid-19 through data mining, descriptive statistics and random forest [O] . Remigio Ismael Hurtado Ortiz, Juan Carlos Barrera Barrera, Katherine Michelle Barrera Barrera 2020

机译：Covid-19中最重要因素的分析模型通过数据挖掘，描述性统计和随机林

Statistical data mining for Sina Weibo, a Chinese micro-blog: sentiment modelling and randomness reduction for topic modelling

摘要

著录项

相似文献

相关主题

期刊订阅