首页> 外文学位 >Linguistically Motivated Combinatory Categorial Grammar Induction.
【24h】

Linguistically Motivated Combinatory Categorial Grammar Induction.

机译:语言动机的组合类别语法归纳法。

获取原文
获取原文并翻译 | 示例

摘要

Combinatory Categorial Grammar (CCG) is a widely studied grammar formalism that has been used in a variety of NLP applications, e.g., semantic parsing, and machine translation. One key challenge in building effective CCG parsers is a lack of labeled training data, which is expensive to produce manually. Instead, researchers have developed automated approaches for inducing the grammars. These algorithms learn lexical entries that define the syntax and semantics of individual words, and probabilistic models that rank the set of possible parses for each sentence. Various types of universal or language specific prior knowledge and supervising signals can be exploited to prune the grammar search space and constrain parameter estimation.;In this thesis, we introduce new methods for inducing linguistically motivated grammars that generalize well from small amounts of labeled training data. We first present a CCG grammar induction scheme for semantic parsing, where the grammar is restricted by modeling a wide range of linguistic constructions, then introduce a new lexical generalization model that abstracts over systematic morphological, syntactic, and semantic variations in languages. Finally, we describe a weakly supervised approach for inducing broad scale CCG syntactic structures for multiple languages. Such approaches would have the greatest utility for low-resource languages, as well as domains where it is prohibitively expensive to gather sufficient amounts of training data.
机译:组合分类语法(CCG)是一种经过广泛研究的语法形式主义,已被用于各种NLP应用程序中,例如语义解析和机器翻译。建立有效的CCG解析器的一个关键挑战是缺少标记的训练数据,这对于手动生成来说是昂贵的。相反,研究人员开发了自动归纳语法的方法。这些算法学习定义单个单词的语法和语义的词汇条目,以及对每个句子的可能分析集进行排名的概率模型。可以利用各种类型的通用的或特定于语言的先验知识和监督信号来修剪语法搜索空间并约束参数估计。本论文中,我们介绍了一种新的方法,可以从少量标记的训练数据中很好地归纳出基于语言的语法。 。我们首先提出一种用于语义解析的CCG语法归纳方案,其中通过对各种语言结构进行建模来限制语法,然后引入一种新的词法归纳模型,该模型对语言中的系统形态,句法和语义变化进行抽象。最后,我们描述了一种弱监督的方法,用于诱导多种语言的大规模CCG语法结构。对于资源匮乏的语言以及收集足够数量的训练数据非常昂贵的领域,此类方法将具有最大的实用性。

著录项

  • 作者

    Wang, Adrienne X.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 108 p.
  • 总页数 108
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号