首页> 外文会议>International conference on computational linguistics >A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles
【24h】

A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles

机译:基于语料库的特色和非特色维基百科文章中的编辑类别研究

获取原文

摘要

In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte's (1981) model. Example edit categories include spelling error corrections and vandalism. In a manual multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of α = 0.67. We further analyze the distribution of edit categories for distinct stages in the revision history of 10 featured and 10 non-featured articles. Our results show that the information content in featured articles tends to become more stable after their promotion. On the opposite, this is not true for non-featured articles. We make the resulting corpus and the annotation guidelines freely available.
机译:在本文中,我们展示了维基百科的协同写作过程的研究。我们的工作基于1,995条编辑的语料库,从英国维基百科的891条修订中获得。我们提出了一种基于Faigley和Witte(1981)模型的编辑的21类分类方案。示例编辑类别包括拼写错误纠正和故意破坏。在具有3个注释器的手动多标签注释研究中,我们获得了α= 0.67的共注入者协议。我们进一步分析了10个特色和10条未特色文章的修订历史中的不同阶段的编辑类别的分布。我们的研究结果表明,特色文章中的信息内容往往在促销后变得更加稳定。在相反的情况下,这对于未特色文章来说是不是正确的。我们使由此产生的语料库和注释指南自由可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号