首页> 外文会议>Machine learning for multimodal interaction >Syntactic Chunking Across Different Corpora
【24h】

Syntactic Chunking Across Different Corpora

机译:跨语料库的句法分块

获取原文
获取原文并翻译 | 示例

摘要

Syntactic chunking has been a well-defined and well-studied task since its introduction in 2000 as the CONLL shared task. Though some efforts have been further spent on chunking performance improvement, the experimental data has been restricted, with few exceptions, to (part of) the Wall Street Journal data, as adopted in the shared task. It remains open how those successful chunking technologies could be extended to other data, which may differ in genre/domain and/or amount of annotation. In this paper we first train chunkers with three classifiers on three different data sets and test on four data sets. We also vary the size of training data systematically to show data requirements for chunkers. It turns out that there is no significant difference between those state-of-the-art classifiers; training on plentiful data from the same corpus (switchboard) yields comparable results to Wall Street Journal chunkers even when the underlying material is spoken; the results from a large amount of unmatched training data can be obtained by using a very modest amount of matched training data.
机译:自从2000年作为CONLL共享任务推出以来,语法块已成为一项定义明确且经过充分研究的任务。尽管在提高性能的分块方面还付出了一些努力,但实验数据已被共享任务中所采用的华尔街日报数据(部分)限制为(几乎没有例外)。那些成功的分块技术如何扩展到其他数据,这仍然是未知的,这些数据可能在类型/领域和/或注释数量上有所不同。在本文中,我们首先在三个不同的数据集上训练带有三个分类器的分块器,并在四个数据集上进行测试。我们还系统地改变了训练数据的大小,以显示分块器的数据需求。事实证明,这些最新的分类器之间没有显着差异。即使讲了基础材料,对来自同一语料库(总机)的大量数据进行培训也可以产生与《华尔街日报》分块器相当的结果;通过使用非常少量的匹配训练数据可以获得大量不匹配训练数据的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号