首页> 外文期刊>Data >KazNewsDataset: Single Country Overall Digital Mass Media Publication Corpus
【24h】

KazNewsDataset: Single Country Overall Digital Mass Media Publication Corpus

机译:Kaznewsdataset:单一国家整体数量大众媒体出版物语料库

获取原文
       

摘要

Mass media is one of the most important elements influencing the information environment of society. The mass media is not only a source of information about what is happening but is often the authority that shapes the information agenda, the boundaries, and forms of discussion on socially relevant topics. A multifaceted and, where possible, quantitative assessment of mass media performance is crucial for understanding their objectivity, tone, thematic focus and, quality. The paper presents a corpus of Kazakhstan media, which contains over 4 million publications from 36 primary sources (which has at least 500 publications). The corpus also includes more than 2 million texts of Russian media for comparative analysis of publication activity of the countries, also about 4000 sections of state policy documents. The paper briefly describes the natural language processing and multiple-criteria decision-making methods, which are the algorithmic basis of the text and mass media evaluation method, and describes the results of several research cases, such as identification of propaganda, assessment of the tone of publications, calculation of the level of socially relevant negativity, comparative analysis of publication activity in the field of renewable energy. Experiments confirm the general possibility of evaluating the socially significant news, identifying texts with propagandistic content, evaluating the sentiment of publications using the topic model of the text corpus since the area under receiver operating characteristics curve (ROC AUC) values of 0.81, 0.73 and 0.93 were achieved on abovementioned tasks. The described cases do not exhaust the possibilities of thematic, tonal, dynamic, etc., analysis of the considered corpus of texts. The corpus will be interesting to researchers considering both multiple publications and mass media analysis, including comparative analysis and identification of common patterns inherent in the media of different countries.
机译:大众媒体是影响社会信息环境的最重要因素之一。大众媒体不仅是关于正在发生的事情的信息来源,而且通常是对社会相关主题的信息议程,界限和讨论形式塑造信息议程的权威。多方面,在可能的情况下,大众媒体性能的定量评估对于了解其客观性,音调,专题重点和质量至关重要。本文介绍了哈萨克斯坦媒体的语料,其中包含来自36个主要来源的400多万件出版物(至少有500个出版物)。该委托人还包括俄罗斯媒体的200万多个文本,用于对各国的出版活动的比较分析,也约为国家政策文件的4000个部分。本文简要介绍了自然语言处理和多标准决策方法,这些方法是文本和大众媒体评估方法的算法基础,并描述了几种研究案例的结果,例如识别宣传,对基调的评估出版物,计算社会相关消极性水平,可再生能源领域出版活动的比较分析。实验证实了评估社会重要消息的一般可能性,识别具有宣传内容的文本,使用文本语料库的主题模型评估出版物的情绪,因为接收器操作特性曲线(ROC AUC)值为0.81,0.73和0.93在上述任务上取得了成就。所描述的病例不会耗尽主题,色调,动态等的可能性,分析所考虑的文本语料库。考虑到多个出版物和大众媒体分析的研究人员,语料库将是有趣的,包括不同国家媒体中固有的常见模式的比较分析和鉴定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号