Syntactic Chunking Across Different Corpora

机译：横跨不同的Corpora的句法分数

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the conll shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.

机译：自2000年作为Conll共享任务的介绍，句法分量是一项定义明确的任务。虽然已经在分支性能改进方面进一步努力，但实验数据已受到限制，少数例外，遍布共享任务所采用的华尔街日报数据（一部分）。它仍然打开这些成功的块技术如何扩展到其他数据，这些数据可能因类型/域和/或注释量而异。在本文中，我们首先在三个不同的数据集中使用三个分类器列车，并在四个数据集上进行测试。我们还系统地改变了培训数据的大小，以显示块的数据要求。事实证明，这些最先进的分类器之间没有显着差异;来自同一语料库（交换机）的丰富数据训练，即使在潜在的材料被说出来，也会对华尔街日记块的相当的结果产生了可比的结果;通过使用非常适度的匹配训练数据可以获得大量无与伦比的训练数据的结果。

著录项

来源
《International workshop on machine learning for multimodal interaction》|2006年||共12页
会议地点
作者
Weiqun Xu; Jean Carletta; Johanna Moore;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish [J] . Veronika Laippala, Jenna Kanerva, Filip Ginter Procedia - Social and Behavioral Sciences . 2015,第1期

机译：N语法为关键结构，反映了芬兰语语料库的典型句法模式
2. Bilingual terminology extraction from parallel corpora using chunk-based alignment [J] . Lieve Macken, Els Lefever, Veronique Hoste Terminology . 2013,第1期

机译：使用基于块的对齐方式从并行语料库中提取双语术语
3. Can chunking reduce syntactic complexity of natural languages? [J] . Lu Qian, Xu Chunshan, Liu Haitao Complexity . 2016,第S2期

机译：分块可以降低自然语言的语法复杂性吗？
4. Syntactic Chunking Across Different Corpora [C] . Weiqun Xu, Jean Carletta, Johanna Moore Machine learning for multimodal interaction . 2006

机译：跨语料库的句法分块
5. Design and Analysis of Random Linear Network Coding Schemes: Dense Codes, Chunked Codes and Overlapped Chunked Codes. [D] . Heidarzadeh, Anoosheh. 2013

机译：随机线性网络编码方案的设计和分析：密集码，分块码和重叠分块码。
6. Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus [O] . Aleksandar Savkov, John Carroll, Rob Koeling, -1

机译：使用句法块和命名实体注释患者临床记录：Harvey语料库
7. Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish [O] . Laippala Veronika, Kanerva Jenna, Ginter Filip 2015

机译：N语法为关键结构，反映了芬兰语语料库的典型句法模式
8. Buffer Loading and Chunking in Sequential Keypressing (Het Laden van de MotorBuffer Versus Het Gebruik van Motor Chunks bij Sequentieule Toetsdrukseries) [R] . Verwey, W. B. 1994

机译：顺序按键中的缓冲加载和分块（Het Laden van de motorBuffer与Het Gebruik van motor Chunks bij sequentieule Toetsdrukseries）

Syntactic Chunking Across Different Corpora

摘要

著录项

相似文献

相关主题

期刊订阅