首页> 外文期刊>Information Processing & Management >Understanding and predicting Web content credibility using the Content Credibility Corpus
【24h】

Understanding and predicting Web content credibility using the Content Credibility Corpus

机译:使用内容可信语料库了解和预测Web内容可信度

获取原文
获取原文并翻译 | 示例
       

摘要

The goal of our research is to create a predictive model of Web content credibility evaluations, based on human evaluations. The model has to be based on a comprehensive set of independent factors that can be used to guide user's credibility evaluations in crowdsourced systems like WOT, but also to design machine classifiers of Web content credibility. The factors described in this article are based on empirical data. We have created a dataset obtained from an extensive crowdsourced Web credibility assessment study (over 15 thousand evaluations of over 5000 Web pages from over 2000 participants). First, online participants evaluated a multi-domain corpus of selected Web pages. Using the acquired data and text mining techniques we have prepared a code book and conducted another crowdsourcing round to label textual justifications of the former responses. We have extended the list of significant credibility assessment factors described in previous research and analyzed their relationships to credibility evaluation scores. Discovered factors that affect Web content credibility evaluations are also weakly correlated, which makes them more useful for modeling and predicting credibility evaluations. Based on the newly identified factors, we propose a predictive model for Web content credibility. The model can be used to determine the significance and impact of discovered factors on credibility evaluations. These findings can guide future research on the design of automatic or semiautomatic systems for Web content credibility evaluation support. This study also contributes the largest credibility dataset currently publicly available for research: the Content Credibility Corpus (C3).
机译:我们研究的目的是基于人工评估,创建Web内容可信度评估的预测模型。该模型必须基于一组全面的独立因素,这些因素可用于指导诸如WOT的众包系统中的用户可信度评估,还可以设计Web内容可信度的机器分类器。本文中描述的因素是基于经验数据。我们已经创建了一个数据集,该数据集是从广泛的众包Web信誉评估研究(来自2000多个参与者的5000多个Web页面进行的超过1.5万次评估)获得的。首先,在线参与者评估了所选网页的多域语料库。使用获得的数据和文本挖掘技术,我们准备了代码簿,并进行了另一轮众包,以标记先前响应的文本理由。我们扩展了先前研究中描述的重要信誉评估因素的清单,并分析了它们与信誉评估分数的关系。发现的影响Web内容可信度评估的因素之间的关联也很弱,这使它们对于建模和预测可信度评估更加有用。基于新发现的因素,我们提出了一种用于Web内容可信度的预测模型。该模型可用于确定发现因素对可信度评估的重要性和影响。这些发现可以指导将来对Web内容信誉评估支持的自动或半自动系统设计的研究。这项研究还贡献了目前可供研究的最大可信度数据集:内容可信度语料库(C3)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号