首页> 外文OA文献 >CREDIBILITY ASSESSMENT FOR ARABIC MICRO-BLOGS USING NOISY LABELS
【2h】

CREDIBILITY ASSESSMENT FOR ARABIC MICRO-BLOGS USING NOISY LABELS

机译:带有噪声标签的阿拉伯微博的可信度评估

摘要

Due to their openness and low publishing barrier nature, User-Generated Content (UGC) platforms facilitate the creation of huge amounts of data, containing a substantial quantity of inaccurate content. The presence of misleading, questionable and inaccurate content may have detrimental effects on people's beliefs and decision-making and may create a public disturbance. Consequently, there is significant need to evaluate information coming from UGC platforms to differentiate credible information from misinformation and rumours. In this thesis, we present the need for research about online Arabic information credibility and argue that by extending the existing automated credibility assessment approaches to adding an extra step to evaluate labellers will lead to a more robust dataset for building the credibility classification model.This research focuses on modelling the credibility of Arabic information in the presence of disagreed judging credibility scores and ground truth of credibility information is not absolute. First, in order to achieve the stated goal, this study employs the idea of crowdsourcing whereby users can explicitly express their opinions about the credibility of a set of tweet messages. This information coupled with the data about tweets’ features enables us to identify messages’ prominent features with the highest usage in determining information credibility levels. Then experiments based on both statistical analysis using features’ distributions and machine learning methods are performed to predict and classify messages’ credibility levels. A novel credibility assessment model which integrates the labellers’ reliability weights is proposed when deriving the credibility labels for the messages in the training and testing dataset. This credibility model primarily uses similarity and accuracy rating measurements for evaluating the weighting of labellers.In order to evaluate proposed model, we compare the labelling obtained from the expert labellers with those from the weighted crowd labellers. Empirical evidence proposed that the credibility model is superior to the commonly used majority voting baseline compared to the experts’ rating evaluations. The observed experimental results exhibit a reduction of the effect of unreliable labellers’ credibility judgments and a moderate enhancement of the credibility classification results.
机译:由于其开放性和低发布障碍的性质,用户生成内容(UGC)平台有助于创建大量数据,其中包含大量不准确的内容。存在误导,可疑和不正确的内容可能会对人们的信仰和决策产生不利影响,并可能引起公众骚扰。因此,迫切需要评估来自UGC平台的信息,以区分可信信息与错误信息和谣言。在本文中,我们提出了有关在线阿拉伯信息可信度研究的需求,并认为通过扩展现有的自动可信度评估方法以增加额外的步骤来评估标签制作者,将导致建立可信度分类模型的功能更强大的数据集。在存在不同的评判可信度分数的情况下,重点在于对阿拉伯语信息的可信度建模,并且可信度信息的真实性不是绝对的。首先,为了达到既定目标,本研究采用了众包的想法,即用户可以明确表达其对一系列推文消息的可信度的意见。这些信息与有关推文功能的数据结合在一起,使我们能够在确定信息可信度级别时,以最高的使用率来识别邮件的突出功能。然后根据使用功能分布和机器学习方法进行的统计分析进行实验,以预测和分类消息的可信度。在推导训练和测试数据集中消息的可信度标签时,提出了一种新颖的可信度评估模型,该模型集成了标签商的可靠性权重。该可信度模型主要使用相似性和准确性等级测量值来评估贴标机的权重。为了评估提议的模型,我们比较了从专业贴标机和加权人群贴标机获得的标签。经验证据表明,相比专家的评级评估,可信度模型优于常用的多数投票基准。观察到的实验结果表明,不可靠的标记商的可信度判断结果有所降低,可信度分类结果有所提高。

著录项

  • 作者

    Almansour Amal;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号