The WeSearch Corpus, Treebank, and Treecache A Comprehensive Sample of User-Generated Content

机译：WeSearch语料库，树库和树缓存用户生成内容的综合样本

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present the WeSearch Data Collection (WDC)-a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data selection and extraction process, with a focus on the extraction of linguistic content from different sources. We present the format of syntacto-semantic annotations found in this resource and present initial parsing results for these data, as well as some reflections following a first round of treebanking.

机译：我们介绍了WeSearch数据收集（WDC），这是一个可免费重新分发的，部分注释的，用户生成内容的全面示例。 WDC包含从各种形式不同的流派（用户论坛，产品评论站点，博客和Wikipedia）中提取的数据，并涵盖两个不同的域（NLP和Linux）。在本文中，我们描述了数据选择和提取过程，重点是从不同来源提取语言内容。我们介绍了在此资源中找到的语法语义注释的格式，并提供了这些数据的初始解析结果，以及在第一轮树状存储之后的一些思考。

著录项

来源
《International conference on language resources and evaluation》|2012年|1829-1835|共7页
会议地点
作者
Jonathon Read; Dan Flickinger; Rebecca Dridan; Stephan Oepen; Lilja Ovrelid;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
User-Generated Content; Open-Source Corpus; Manually Validated Treebank; Automatically Created Treecache;

机译：用户生成内容;开源语料库;手动验证的树库;自动创建的树缓存;

相似文献

外文文献
中文文献
专利

1. A Random Digit Search (RDS) Method for Sampling of Blogs and Other User-Generated Content [J] . Jonathan J. H. Zhu, Qian Mo, Fang Wang, Social science computer review . 2011,第3期

机译：用于博客和其他用户生成内容采样的随机数字搜索（RDS）方法
2. The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations [J] . Zhou Yuping, Xue Nianwen Language Resources and Evaluation . 2015,第2期

机译：汉语话语树库：带有语篇关系的中文语料库
3. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus [J] . NAIWEN XUE, FEI XIA, FU-DONG CHIOU, Natural language engineering . 2005,第Pt2期

机译：宾州中文树银行：大型语料库的短语结构注释
4. The WeSearch Corpus, Treebank, and Treecache A Comprehensive Sample of User-Generated Content [C] . Jonathon Read, Dan Flickinger, Rebecca Dridan, LREC-2012 . 2012

机译：Wesearch语料库，TreeBank和TreeCache是一个全面的用户生成内容样本
5. Toward a Comprehensive Understanding of User-Generated Content and Engagement Behavior on Facebook Business Pages [D] . Yang, Mochen. 2018

机译：全面了解Facebook业务页面上用户生成的内容和参与行为
6. LipidPioneer: A Comprehensive User-Generated Exact Mass Template for Lipidomics [O] . Candice Z. Ulmer, Jeremy P. Koelmel, Jared M. Ragland, -1

机译：LipidPioneer：用于脂质组学的综合用户生成的精确质量模板
7. Building a treebank of noisy user-generated content: The French Social Media Bank [O] . Seddah Djamé, Sagot Benoît, Candito Marie, 2012

机译：建立嘈杂的用户生成内容的树库：法国社交媒体银行
8. Building a Large Annotated Corpus of English: The Penn Treebank [R] . Marcus, M. 1993

机译：建立一个大的注释英语语料库：宾州树库

The WeSearch Corpus, Treebank, and Treecache A Comprehensive Sample of User-Generated Content

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅