Syntactic Chunking Across Different Corpora

机译：跨语料库的句法分块

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the CONLL shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.

机译：自从2000年作为CONLL共享任务推出以来，语法块已成为一项定义明确且经过充分研究的任务。尽管在提高性能的分块方面还付出了一些努力，但实验数据已被共享任务中所采用的华尔街日报数据（部分）限制为（几乎没有例外）。那些成功的分块技术如何扩展到其他数据，这仍然是未知的，这些数据可能在类型/领域和/或注释数量上有所不同。在本文中，我们首先在三个不同的数据集上训练带有三个分类器的分块器，并在四个数据集上进行测试。我们还系统地改变了训练数据的大小，以显示分块器的数据需求。事实证明，这些最新的分类器之间没有显着差异。即使讲了基础材料，对来自同一语料库（总机）的大量数据进行培训也可以产生与《华尔街日报》分块器相当的结果；通过使用非常少量的匹配训练数据可以获得大量不匹配训练数据的结果。

著录项

来源
《Machine learning for multimodal interaction》|2006年|166-177|共12页
会议地点 Bethesda MD(US);Bethesda MD(US)
作者
Weiqun Xu; Jean Carletta; Johanna Moore;
展开▼
作者单位

HCRC and ICCS, School of Informatics, University of Edinburgh;

HCRC and ICCS, School of Informatics, University of Edinburgh;

HCRC and ICCS, School of Informatics, University of Edinburgh;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish [J] . Veronika Laippala, Jenna Kanerva, Filip Ginter Procedia - Social and Behavioral Sciences . 2015,第1期

机译：N语法为关键结构，反映了芬兰语语料库的典型句法模式
2. Bilingual terminology extraction from parallel corpora using chunk-based alignment [J] . Lieve Macken, Els Lefever, Veronique Hoste Terminology . 2013,第1期

机译：使用基于块的对齐方式从并行语料库中提取双语术语
3. Can chunking reduce syntactic complexity of natural languages? [J] . Lu Qian, Xu Chunshan, Liu Haitao Complexity . 2016,第S2期

机译：分块可以降低自然语言的语法复杂性吗？
4. Syntactic Chunking Across Different Corpora [C] . Weiqun Xu, Jean Carletta, Johanna Moore International workshop on machine learning for multimodal interaction . 2006

机译：横跨不同的Corpora的句法分数
5. Design and Analysis of Random Linear Network Coding Schemes: Dense Codes, Chunked Codes and Overlapped Chunked Codes. [D] . Heidarzadeh, Anoosheh. 2013

机译：随机线性网络编码方案的设计和分析：密集码，分块码和重叠分块码。
6. Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus [O] . Aleksandar Savkov, John Carroll, Rob Koeling, -1

机译：使用句法块和命名实体注释患者临床记录：Harvey语料库
7. Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish [O] . Laippala Veronika, Kanerva Jenna, Ginter Filip 2015

机译：N语法为关键结构，反映了芬兰语语料库的典型句法模式
8. Buffer Loading and Chunking in Sequential Keypressing (Het Laden van de MotorBuffer Versus Het Gebruik van Motor Chunks bij Sequentieule Toetsdrukseries) [R] . Verwey, W. B. 1994

机译：顺序按键中的缓冲加载和分块（Het Laden van de motorBuffer与Het Gebruik van motor Chunks bij sequentieule Toetsdrukseries）

Syntactic Chunking Across Different Corpora

摘要

著录项

相似文献

相关主题

期刊订阅