首页> 外文学位 >Models for improved tractability and accuracy in dependency parsing.
【24h】

Models for improved tractability and accuracy in dependency parsing.

机译:用于在依赖关系分析中提高易处理性和准确性的模型。

获取原文
获取原文并翻译 | 示例

摘要

Automatic syntactic analysis of natural language is one of the fundamental problems in natural language processing. Dependency parses (directed trees in which edges represent the syntactic relationships between the words in a sentence) have been found to be particularly useful for machine translation, question answering, and other practical applications.;For English dependency parsing, we show that models and features compatible with how conjunctions are represented in treebanks yield a parser with state-of-the-art overall accuracy and substantial improvements in the accuracy of conjunctions.;For languages other than English, dependency parsing has often been formulated as either searching over trees without any crossing dependencies (projective trees) or searching over all directed spanning trees. The former sacrifices the ability to produce many natural language structures; the latter is NP-hard in the presence of features with scopes over siblings or grandparents in the tree.;This thesis explores alternative ways to simultaneously produce crossing dependencies in the output and use models that parametrize over multiple edges.;Gap inheritance is introduced in this thesis and quantifies the nesting of subtrees over intervals. The thesis provides O( n6) and O(n 5) edge-factored parsing algorithms for two new classes of trees based on this property, and extends the latter to include grandparent factors.;This thesis then defines 1-Endpoint-Crossing trees, in which for any edge that is crossed, all other edges that cross that edge share an endpoint. This property covers 95.8% or more of dependency parses across a variety of languages. A crossing-sensitive factorization introduced in this thesis generalizes a commonly used third-order factorization (capable of scoring triples of edges simultaneously).;This thesis provides exact dynamic programming algorithms that find the optimal 1-Endpoint-Crossing tree under either an edge-factored model or this crossing-sensitive third-order model in O(n 4) time, orders of magnitude faster than other mildly non-projective parsing algorithms and identical to the parsing time for projective trees under the third-order model. The implemented parser is significantly more accurate than the third-order projective parser under many experimental settings and significantly less accurate on none.
机译:自然语言的自动句法分析是自然语言处理中的基本问题之一。已经发现依赖解析(有向树的边缘代表句子中单词之间的句法关系)对于机器翻译,问题回答和其他实际应用特别有用;对于英语依赖解析,我们展示了模型和功能与树库中的连词表示方式兼容可以产生具有最先进的整体准确性并大大提高连词准确性的解析器;对于英语以外的其他语言,依赖性解析通常被表示为在没有任何树的情况下搜索树越过依赖性(投影树)或搜索所有有向生成树。前者牺牲了产生许多自然语言结构的能力。后者在树中具有在兄弟姐妹或祖父母中具有范围的特征的情况下是NP困难的。;本文探索了在输出中同时产生交叉依赖性并使用对多个边进行参数化的模型的替代方法。本论文并量化了子树在时间间隔上的嵌套。本文基于此属性为两类新树提供了O(n6)和O(n 5)边缘因子解析算法,并将后者扩展为包括祖父母因子。;然后,本文定义了1-端点穿越树,其中,对于任何相交的边,与该边相交的所有其他边都共享一个端点。此属性涵盖各种语言的95.8%或更多的依赖项解析。本文引入的交叉敏感因式分解概括了一种常用的三阶因式分解(能够同时对边缘的三倍进行评分)。本论文提供了精确的动态规划算法,可以在任一边缘下找到最优的1-End-Crossing树分解模型或此交叉敏感的三阶模型的时间为O(n 4),比其他轻度非投影解析算法快几个数量级,并且与三阶模型下投影树的解析时间相同。在许多实验设置下,已实现的解析器比三阶投影解析器要准确得多,而在任何情况下,解析器的准确性都将大大降低。

著录项

  • 作者

    Pitler, Emily.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 166 p.
  • 总页数 166
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号