首页> 外文学位 >Any domain parsing: Automatic domain adaptation for natural language parsing.
【24h】

Any domain parsing: Automatic domain adaptation for natural language parsing.

机译:任何域解析:自动域适应自然语言解析。

获取原文
获取原文并翻译 | 示例

摘要

Current efforts in syntactic parsing are largely data-driven. These methods require labeled examples of syntactic structures to learn statistical patterns governing these structures. Labeled data typically requires expert annotators which makes it both time consuming and costly to produce. Furthermore, once training data has been created for one textual domain, portability to similar domains is limited. This domain-dependence has inspired a large body of work since syntactic parsing aims to capture syntactic patterns across an entire language rather than just a specific domain.;The simplest approach to this task is to assume that the target domain is essentially the same as the source domain. No additional knowledge about the target domain is included. A more realistic approach assumes that only raw text from the target domain is available. This assumption lends itself well to semi-supervised learning methods since these utilize both labeled and unlabeled examples.;This dissertation focuses on a family of semi-supervised methods called self-training. Self-training creates semi-supervised learners from existing supervised learners with minimal effort. We first show results on self-training for constituency parsing within a single domain. While self-training has failed here in the past, we present a simple modification which allows it to succeed, producing state-of-the-art results for English constituency parsing. Next, we show how self-training is beneficial when parsing across domains and helps further when raw text is available from the target domain. One of the remaining issues is that one must choose a training corpus appropriate for the target domain or performance may be severely impaired. Humans can do this in some situations, but this strategy becomes less practical as we approach larger data sets. We present a technique, Any Domain Parsing, which automatically detects useful source domains and mixes them together to produce a customized parsing model. The resulting models perform almost as well as the best seen parsing models (oracle) for each target domain. As a result, we have a fully automatic syntactic constituency parser which can produce high-quality parses for all types of text, regardless of domain.
机译:当前在语法分析方面的努力主要是由数据驱动的。这些方法需要标记的语法结构示例,以学习控制这些结构的统计模式。带标签的数据通常需要专业的注释者,这使其既耗时又昂贵。此外,一旦已经为一个文本域创建了训练数据,则向相似域的可移植性受到限制。由于语法分析的目的是捕获整个语言而不是特定领域的语法模式,因此这种领域相关性激发了大量工作,这是最简单的方法是假定目标领域与目标领域基本相同。源域。不包括有关目标域的其他知识。一种更实际的方法假定只有目标域中的原始文本可用。这种假设非常适合半监督学习方法,因为它们利用了标记和未标记的示例。本论文着重研究了一系列称为自我训练的半监督方法。自我训练可以以最小的努力从现有的受监督学习者中创建半监督学习者。我们首先显示有关在单个域内进行选区解析的自训练结果。尽管过去的自我训练失败了,但我们提出了一个简单的修改方法,使它成功了,为英语选区分析提供了最新的结果。接下来,我们将展示自训练在跨域解析时的优势,并在目标域提供原始文本时进一步提供帮助。剩下的问题之一是必须选择适合目标领域的培训语料库,否则性能可能会严重受损。在某些情况下,人类可以做到这一点,但是随着我们处理更大的数据集,这种策略变得不那么实用。我们提出一种技术,任何域解析,该技术会自动检测有用的源域并将其混合在一起以生成自定义的解析模型。对于每个目标域,生成的模型的性能几乎与最佳解析模型(oracle)相同。结果,我们有了一个全自动的语法选区解析器,它可以为所有类型的文本生成高质量的解析,而与域无关。

著录项

  • 作者

    McClosky, David.;

  • 作者单位

    Brown University.;

  • 授予单位 Brown University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 86 p.
  • 总页数 86
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:27

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号