首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Cross Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions
【24h】

Cross Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions

机译:使用文章视图和修订版对维基百科上的故意破坏行为进行跨语言预测

获取原文

摘要

Vandalism is a major issue on Wikipedia, accounting for about 2% (350,000+) of edits in the first 5 months of 2012. The majority of vandalism are caused by humans, who can leave traces of their malicious behaviour through access and edit logs. We propose detecting vandalism using a range of classifiers in a monolingual setting, and evaluated their performance when using them across languages on two data sets: the relatively unexplored hourly count of views of each Wikipedia article, and the commonly used edit history of articles. Within the same language (English and German), these classifiers achieve up to 87% precision, 87% recall, and F1-score of 87%. Applying these classifiers across languages achieve similarly high results of up to 83% precision, recall, and Fl-score. These results show characteristic vandal traits can be learned from view and edit patterns, and models built in one language can be applied to other languages.
机译:恶意破坏是Wikipedia上的一个主要问题,在2012年的前5个月中,约有2%(350,000+)次编辑。故意破坏是由人类引起的,他们可以通过访问和编辑日志来留下其恶意行为的痕迹。我们建议在单一语言环境中使用一系列分类器来检测破坏行为,并在两种数据集上跨语言使用它们时评估它们的性能:每篇Wikipedia文章的相对未开发的每小时观看次数统计,以及文章的常用编辑历史记录。在相同的语言(英语和德语)中,这些分类器可实现高达87%的精度,87%的召回率和87%的F1分数。在各种语言中应用这些分类器,可以达到类似的高结果,其准确率,召回率和Fl得分高达83%。这些结果表明,可以通过查看和编辑模式来学习特征性破坏特征,并且以一种语言构建的模型可以应用于其他语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号