Rethinking The Corpus: Moving towards Dynamic Linguistic Resources

机译：重新思考语料库：转向动态语言资源

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The corpus is an invaluable resource in Spoken and Natural Language Processing. Consistent data sets have allowed for empirical evaluation of competing algorithms. The sharing of high-quality annotated linguistic data has enabled participation and experimentation by a wide range of researchers. However, despite dubbing these annotations as "gold-standard", many corpora contain labeling errors and idiosyncrasies. The current view of the corpus as a static resource makes correction of errors and other modifications prohibitively difficult. In this paper, a perspective of the corpus as dynamically changing is advanced. We highlight the problems of the static view of the corpus through case studies of the Penn Treebank, Switchboard, Hub-4 and Boston University Radio News Corpus. We propose the use of version control software as a mechanism to facilitate this dynamic view.

机译：语料库是口语和自然语言处理中的宝贵资源。一致的数据集允许对竞争算法进行实证评估。高质量注释语言数据的共享使众多研究人员能够参与和试验。但是，尽管将这些注释称为“黄金标准”，但许多语料库仍包含标签错误和特质。语料库作为静态资源的当前观点使得错误纠正和其他修改变得异常困难。本文提出了语料库动态变化的观点。通过对Penn Treebank，Switchboard，Hub-4和Boston University Radio News语料库的案例研究，我们突出了语料库静态视图的问题。我们建议使用版本控制软件作为促进此动态视图的机制。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|1390-1393|共4页
会议地点
作者
Andrew Rosenberg;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Linguistic Resources; Opinion paper;

机译：语言资源;意见书;

相似文献

外文文献
中文文献
专利

1. The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics [J] . Reem F. Alfuraih Language Resources and Evaluation . 2020,第3期

机译：本科学习者翻译语料库：翻译研究和计算语言学的新资源
2. A digital corpus resource of authentic anonymized French text messages: 88milSMS-What about transcoding and linguistic annotation? [J] . Panckhurst Rachel Literary & linguistic computing . 2017,第aprasuppla1期

机译：真实的匿名法国文本消息的数字语料库资源：88milSMS-代码转换和语言注释如何处理？
3. The Vienna-Oxford International Corpus of English (VOICE) A linguistic resource for exploring English as a lingua franca [J] . Angelika Breiteneder, Theresa Klimpfinger, Stefan Majewski, OEGAI journal . 2009,第1期

机译：维也纳-牛津国际英语语料库（VOICE）一种探索英语作为通用语言的语言资源
4. Rethinking The Corpus: Moving towards Dynamic Linguistic Resources [C] . Andrew Rosenberg INTERSPEECH 2012 . 2012

机译：重新思考语料库：走向动态语言资源
5. Rethinking "Rethinking Chartism": Revisiting the linguistic turn on the failure of Chartism. [D] . Scarborough, Benjamin G. 2010

机译：重新思考“重新思考宪章主义”：重新审视宪章主义失败的语言学转向。
6. Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources [O] . Valery Solovyev, Vladimir Ivanov 2016

机译：俄语中的知识驱动事件提取：基于语料库的语言资源
7. Linguistically Annotated Corpus as an Invaluable Resource for Advancements in Linguistic Research: A Case Study [O] . Hajič Jan, Hajičová Eva, Mírovský Jiří, 2016

机译：语言学注释语料库作为语言研究进步的宝贵资源：案例研究

Rethinking The Corpus: Moving towards Dynamic Linguistic Resources

摘要

著录项

相似文献

相关主题

期刊订阅