...
首页> 外文期刊>Procedia Computer Science >Propaganda Identification Using Topic Modelling
【24h】

Propaganda Identification Using Topic Modelling

机译:使用主题建模的宣传识别

获取原文

摘要

This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.
机译:本文介绍了一种基于主题建模的方法,用于识别具有宣传内容的文本。该方法是一种尝试结合从大型未标记的或(半)自动标记的数据集获得有效矢量表示的传递学习理念,同时还试图通过引入高级标签(手动或自动)最小化必要的手动专家标签的数量)在一些明确的文件属性上。该方法包括四个关键阶段:组成语料库分区,计算联合组的主题模型,计算每个主题的基层不平衡估计;推断所有文件的不平衡估计结果。该方法在1000个新闻的标记子相位上交叉验证,实现了高预测功率 - ROC AUC 0.73。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号