首页> 外文OA文献 >Examining Information on Social Media: Topic Modelling, Trend Prediction and Community Classification
【2h】

Examining Information on Social Media: Topic Modelling, Trend Prediction and Community Classification

机译:检查社交媒体信息:主题建模,趋势预测和社区分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the past decade, the use of social media networks (e.g. Twitter) increased dramatically becoming the main channels for the mass public to express their opinions, ideas and preferences, especially during an election or a referendum. Both researchers and the public are interested in understanding what topics are discussed during a real social event, what are the trends of the discussed topics and what is the future topical trend. Indeed, modelling such topics as well as trends offer opportunities for social scientists to continue a long-standing research, i.e. examine the information exchange between people in different communities. We argue that computing science approaches can adequately assist social scientists to extract topics from social media data, to predict their topical trends, or to classify a social media user (e.g. a Twitter user) into a community. However, while topic modelling approaches and classification techniques have been widely used, challenges still exist, such as 1) existing topic modelling approaches can generate topics lacking of coherence for social media data; 2) it is not easy to evaluate the coherence of topics; 3) it can be challenging to generate a large training dataset for developing a social media user classifier. Hence, we identify four tasks to solve these problems and assist social scientists. Initially, we aim to propose topic coherence metrics that effectively evaluate the coherence of topics generated by topic modelling approaches. Such metrics are required to align with human judgements. Since topic modelling approaches cannot always generate useful topics, it is necessary to present users with the most coherent topics using the coherence metrics. Moreover, an effective coherence metric helps us evaluate the performance of our proposed topic modelling approaches. The second task is to propose a topic modelling approach that generates more coherent topics for social media data. We argue that the use of time dimension of social media posts helps a topic modelling approach to distinguish the word usage differences over time, and thus allows to generate topics with higher coherence as well as their trends. A more coherent topic with its trend allows social scientists to quickly identify the topic subject and to focus on analysing the connections between the extracted topics with the social events, e.g., an election. Third, we aim to model and predict the topical trend. Given the timestamps of social media posts within topics, a topical trend can be modelled as a continuous distribution over time. Therefore, we argue that the future trends of topics can be predicted by estimating the density function of their continuous time distribution. By examining the future topical trend, social scientists can ensure the timeliness of their focused events. Politicians and policymakers can keep abreast of the topics that remain salient over time. Finally, we aim to offer a general method that can quickly obtain a large training dataset for constructing a social media user classifier. A social media post contains hashtags and entities. These hashtags (e.g. "#YesScot" in Scottish Independence Referendum) and entities (e.g., job title or parties' name) can reflect the community affiliation of a social media user. We argue that a large and reliable training dataset can be obtained by distinguishing the usage of these hashtags and entities. Using the obtained training dataset, a social media user community classifier can be quickly achieved, and then used as input to assist in examining the different topics discussed in communities. In conclusion, we have identified four aspects for assisting social scientists to better understand the discussed topics on social media networks. We believe that the proposed tools and approaches can help to examine the exchanges of topics among communities on social media networks.
机译:在过去的十年中,社交媒体网络(例如Twitter)的使用急剧增加,成为大众表达意见,想法和偏爱的主要渠道,尤其是在选举或全民公决期间。研究人员和公众都对了解在真实的社交活动中讨论哪些主题,讨论的主题的趋势以及未来的主题趋势感兴趣。确实,对此类主题和趋势进行建模为社会科学家提供了继续进行长期研究的机会,即研究不同社区中人们之间的信息交换。我们认为计算科学方法可以充分协助社会科学家从社交媒体数据中提取主题,预测其主题趋势或将社交媒体用户(例如Twitter用户)分类为社区。然而,尽管主题建模方法和分类技术已被广泛使用,但挑战仍然存在,例如:1)现有主题建模方法可能会生成缺乏社交媒体数据连贯性的主题; 2)评估主题的连贯性并不容易; 3)生成大型培训数据集以开发社交媒体用户分类器可能具有挑战性。因此,我们确定了解决这些问题并协助社会科学家的四个任务。最初,我们旨在提出主题一致性度量,以有效评估由主题建模方法生成的主题的一致性。需要这样的度量以符合人类的判断。由于主题建模方法无法始终生成有用的主题,因此有必要使用相关性度量为用户提供最相关的主题。此外,有效的连贯性度量有助于我们评估建议的主题建模方法的性能。第二项任务是提出一种主题建模方法,为社交媒体数据生成更连贯的主题。我们认为,社交媒体帖子的时间维度的使用有助于主题建模方法来区分单词使用情况随时间的差异,从而允许生成具有更高连贯性及其趋势的主题。具有趋势的更连贯主题使社会科学家能够快速识别主题主题,并专注于分析提取的主题与社交事件(例如选举)之间的联系。第三,我们旨在建模和预测主题趋势。给定主题内社交媒体帖子的时间戳,可以将主题趋势建模为随着时间的连​​续分布。因此,我们认为可以通过估计主题连续时间分布的密度函数来预测主题的未来趋势。通过检查未来的主题趋势,社会科学家可以确保他们关注的事件的及时性。政治家和政策制定者可以跟上随着时间推移仍然突出的主题。最后,我们旨在提供一种通用方法,该方法可以快速获取用于构建社交媒体用户分类器的大型训练数据集。社交媒体帖子包含主题标签和实体。这些主题标签(例如,苏格兰独立公投中的“ #YesScot”)和实体(例如,职务或政党名称)可以反映社交媒体用户的社区隶属关系。我们认为,可以通过区分这些标签和实体的使用来获得大型且可靠的训练数据集。使用获得的训练数据集,可以快速实现社交媒体用户社区分类器,然后将其用作帮助检查社区中讨论的不同主题的输入。总之,我们确定了四个方面来帮助社会科学家更好地理解社交媒体网络上的讨论主题。我们认为,建议的工具和方法可以帮助检查社交媒体网络上社区之间的主题交流。

著录项

  • 作者

    Fang Anjie;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号