首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Cross-Language Learning from Bots and Users to Detect Vandalism on Wikipedia
【24h】

Cross-Language Learning from Bots and Users to Detect Vandalism on Wikipedia

机译:从机器人和用户进行跨语言学习,以检测维基百科上的故意破坏行为

获取原文
获取原文并翻译 | 示例

摘要

Vandalism, the malicious modification of articles, is a serious problem for open access encyclopedias such as Wikipedia. The use of counter-vandalism bots is changing the way Wikipedia identifies and bans vandals, but their contributions are often not considered nor discussed. In this paper, we propose novel text features capturing the invariants of vandalism across five languages to learn and compare the contributions of bots and users in the task of identifying vandalism. We construct computationally efficient features that highlight the contributions of bots and users, and generalize across languages. We evaluate our proposed features through classification performance on revisions of five Wikipedia languages, totaling over 500 million revisions of over nine million articles. As a comparison, we evaluate these features on the small PAN Wikipedia vandalism data sets, used by previous research, which contain approximately 62,000 revisions. We show differences in the performance of our features on the PAN and the full Wikipedia data set. With the appropriate text features, vandalism bots can be effective across different languages while learning from only one language. Our ultimate aim is to build the next generation of vandalism detection bots based on machine learning approaches that can work effectively across many languages.
机译:恶意破坏文章是对诸如Wikipedia这样的开放式访问百科全书的一个严重问题。反破坏机器人的使用正在改变Wikipedia识别和禁止破坏者的方式,但通常不考虑也不讨论其贡献。在本文中,我们提出了新颖的文本功能来捕获五种语言中的故意破坏行为,以学习和比较机器人和用户在识别破坏行为中的贡献。我们构建了计算有效的功能,突出了机器人和用户的贡献,并跨语言进行了概括。我们通过对五种Wikipedia语言的修订进行分类性能来评估我们提出的功能,这些修订对五百万种以上的900万种文章进行了修订。作为比较,我们在以前的研究使用的PAN维基百科小规模破坏数据集上评估了这些功能,这些数据集包含约62,000个修订版。我们在PAN和完整的Wikipedia数据集上显示了功能的性能差异。借助适当的文本功能,恶意破坏机器人可以在仅从一种语言中学习的同时,跨不同语言有效。我们的最终目标是基于可以跨多种语言有效运行的机器学习方法,构建下一代恶意破坏检测机器人。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号