首页> 外文会议>Workshop on natural language processing for internet freedom >Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot Topics
【24h】

Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot Topics

机译:在新浪微博评估删除后删除:热门话题的多模态分类

获取原文

摘要

Widespread Chinese social media applications such as Weibo are widely known for monitoring and deleting posts to conform to Chinese government requirements. In this paper, we focus on analyzing a dataset of censored and uncensored posts in Weibo. Despite previous work that only considers text content of posts, we take a multi-modal approach that takes into account both text and image content. We categorize this dataset into 14 categories that have the potential to be censored on Weibo, and seek to quantify censorship by topic. Specifically, we investigate how different factors interact to affect censorship. We also investigate how consistently and how quickly different topics are censored. To this end, we have assembled an image dataset with 18,966 images, as well as a text dataset with 994 posts from 14 categories. We then utilized deep learning, CNN localization, and NLP techniques to analyze the target dataset and extract categories, for further analysis to better understand censorship mechanisms in Weibo. We found that sentiment is the only indicator of censorship that is consistent across the variety of topics we identified. Our finding matches with recently leaked logs from Sina Weibo. We also discovered that most categories like those related to anti-government actions (e.g. protest) or categories related to politicians (e.g. Xi Jinping) are often censored, whereas some categories such as crisis-related categories (e.g. rainstorm) are less frequently censored. We also found that censored posts across all categories are deleted in three hours on average.
机译:广泛的中国社交媒体应用,如微博,广为人知,监测和删除职位以符合中国政府要求。在本文中,我们专注于分析微博中被审查和未经审查的帖子的数据集。尽管以前的工作仅考虑了帖子的文本内容,但我们采取了一种多模态方法,该方法考虑了文本和图像内容。我们将此数据集分类为14个类别,这些类别可能会在微博上进行审查,并寻求按主题量化审查。具体而言,我们调查不同因素互动如何影响审查。我们还调查了多么持续,多么迅速被审查。为此,我们组装了一个带18,966个图像的图像数据集,以及来自14个类别的994个帖子的文本数据集。然后,我们利用了深度学习,CNN本地化和NLP技术来分析目标数据集和提取类别,以便进一步分析,以更好地了解微博中的审查机制。我们发现情绪是审查的唯一指标,这些指标在我们确定的各种主题中一致。我们的查找匹配与新浪微博的最近泄露的日志。我们还发现,大多数类别,如与反政府行动(例如抗议)或与政治家有关的类别(例如Xi Jinping)相关的类别经常被审查,而某些类别如危机相关类别(例如Rainstorm)则较不常被审查。我们还发现,所有类别的审查帖子平均删除三个小时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号