【24h】

GraWiTas: a Grammar-based Wikipedia Talk Page Parser

机译:Grawitas:基于语法的维基百科谈话页面解析器

获取原文

摘要

Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article's talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax - resulting in the loss of content - and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar-based approach - offering a transparent implementation and easy extensibility.
机译:维基百科提供了研究人员独特的见解,进入了一个大型自我调节社区的合作和通信模式。文章编辑之间直接沟通的主要媒介是文章的谈话页面。但是,谈话页面文件是非结构化的,因此难以自动分析。存在一些解析器,使其转换为结构化数据格式。但是,它们很少开源,仅支持通话页面语法的有限子集 - 导致内容丢失 - 并且通常仅支持一个导出格式。与本文一起,我们提供非常快速,轻巧的开源解析器,支持各种输出格式。在初步评估中,它实现了高精度。解析器使用基于语法的方法 - 提供透明的实现和易于扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号