...
首页> 外文期刊>Information Processing & Management >Automated identification of bias inducing words in news articles using linguistic and context-oriented features
【24h】

Automated identification of bias inducing words in news articles using linguistic and context-oriented features

机译:使用语言和背景化特征自动识别新闻文章中的偏差词

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Media has a substantial impact on public perception of events, and, accordingly, the way media presents events can potentially alter the beliefs and views of the public. One of the ways in which bias in news articles can be introduced is by altering word choice. Such a form of bias is very challenging to identify automatically due to the high context-dependence and the lack of a large-scale gold-standard data set. In this paper, we present a prototypical yet robust and diverse data set for media bias research. It consists of 1,700 statements representing various media bias instances and contains labels for media bias identification on the word and sentence level. In contrast to existing research, our data incorporate background information on the participants' demographics, political ideology, and their opinion about media in general. Based on our data, we also present a way to detect bias-inducing words in news articles automatically. Our approach is feature-oriented, which provides a strong descriptive and explanatory power compared to deep learning techniques. We identify and engineer various linguistic, lexical, and syntactic features that can potentially be media bias indicators. Our resource collection is the most complete within the media bias research area to the best of our knowledge. We evaluate all of our features in various combinations and retrieve their possible importance both for future research and for the task in general. We also evaluate various possible Machine Learning approaches with all of our features. XGBoost, a decision tree implementation, yields the best results. Our approach achieves an F_1 -score of 0.43, a precision of 0.29, a recall of 0.77, and a ROC AUC of 0.79, which outperforms current media bias detection methods based on features. We propose future improvements, discuss the perspectives of the feature-based approach and a combination of neural networks and deep learning with our current system.
机译:媒体对公众对事件的看得重大影响,并因此,媒体呈现事件的方式可能会改变公众的信仰和观点。可以介绍新闻文章中的偏差的方式之一是通过改变单词选择。由于高的上下文依赖性和缺乏大规模的金标准数据集,这种形式的偏差是非常具有挑战性的。在本文中,我们为媒体偏差研究提供了一种原型且具有多样化的数据集。它由1,700个语句组成,代表各种媒体偏见实例,并包含媒体偏见识别的标签和句子级别。与现有研究相比,我们的数据纳入了参与者人口统计,政治意识形态的背景信息,以及一般来说媒体的意见。根据我们的数据,我们还提供了一种方法,可以自动检测新闻文章中的偏见诱导词语。我们的方法是面向的,与深度学习技术相比,提供了强大的描述性和解释性力量。我们识别和工程师各种语言,词汇和句法特征,可能是媒体偏置指标。我们的资源集合是媒体偏见研究领域最完整的据我们所知。我们以各种组合评估我们的所有特征,并在将来的研究和一般任务中检索其可能的重要性。我们还评估了各种特征的各种可能的机器学习方法。 XGBoost是一个决策树实现,产生了最佳结果。我们的方法实现了0.43的F_1-Score,精度为0.29,召回0.77,ROC AUC为0.79,这优于基于特征的电流介质偏置检测方法。我们提出了未来的改进,讨论了基于特征的方法和神经网络的组合与我们目前的系统的结合。

著录项

  • 来源
    《Information Processing & Management》 |2021年第3期|102505.1-102505.15|共15页
  • 作者单位

    University of Konstanz Universitaetsstrasse 10 DE-78464 Konstanz Germany University of Wuppertal Gaussstrasse 20 DE-42119 Wuppertal Germany;

    University of Konstanz Universitaetsstrasse 10 DE-78464 Konstanz Germany;

    University of Passau Innstrasse 41 DE-94032 Passau Germany;

    University of Konstanz Universitaetsstrasse 10 DE-78464 Konstanz Germany Heidelberg Academy of Sciences and Humanities Germany;

    University of Passau Innstrasse 41 DE-94032 Passau Germany;

    University of Wuppertal Gaussstrasse 20 DE-42119 Wuppertal Germany Heidelberg Academy of Sciences and Humanities Germany;

    University of Zurich Raemistrasse 71 CH-8006 Zuerich Switzerland Heidelberg Academy of Sciences and Humanities Germany;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Media bias; Feature engineering; Text analysis; Context analysis; News analysis; Bias data set;

    机译:媒体偏见;功能工程;文字分析;上下文分析;新闻分析;偏置数据集;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号