首页> 外文期刊>Computational linguistics >Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System
【24h】

Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System

机译:希伯来语解析系统中的分词,未知词解析和词法一致性

获取原文
       

摘要

We present a constituency parsing system for Modern Hebrew. The system is based on the PCFG-LA parsing method of Petrov et al. 2006, which is extended in various ways in order to accommodate the specificities of Hebrew as a morphologically rich language with a small treebank. We show that parsing performance can be enhanced by utilizing a language resource external to the treebank, specifically, a lexicon-based morphological analyzer. We present a computational model of interfacing the external lexicon and a treebank-based parser, also in the common case where the lexicon and the treebank follow different annotation schemes. We show that Hebrew word-segmentation and constituency-parsing can be performed jointly using CKY lattice parsing. Performing the tasks jointly is effective, and substantially outperforms a pipeline-based model. We suggest modeling grammatical agreement in a constituency-based parser as a filter mechanism that is orthogonal to the grammar, and present a concrete implementation of the method. Although the constituency parser does not make many agreement mistakes to begin with, the filter mechanism is effective in fixing the agreement mistakes that the parser does make.These contributions extend outside of the scope of Hebrew processing, and are of general applicability to the NLP community. Hebrew is a specific case of a morphologically rich language, and ideas presented in this work are useful also for processing other languages, including English. The lattice-based parsing methodology is useful in any case where the input is uncertain. Extending the lexical coverage of a treebank-derived parser using an external lexicon is relevant for any language with a small treebank.
机译:我们为现代希伯来语提供了一个选区分析系统。该系统基于Petrov等人的PCFG-LA解析方法。 2006年,它以各种方式扩展,以适应希伯来语的特殊性,希伯来语是一种形态丰富的语言,带有一个小的树库。我们显示可以通过利用树库外部的语言资源(特别是基于词典的形态分析器)来提高解析性能。我们提供了一个外部模型与基于树库的解析器接口的计算模型,在通常情况下,字典和树库遵循不同的注释方案。我们显示希伯来语单词分割和选区解析可以使用CKY格点解析一起执行。联合执行任务是有效的,并且大大优于基于管道的模型。我们建议在基于选区的解析器中将语法协议建模为与语法正交的过滤器机制,并提出该方法的具体实现。尽管选区解析器一开始并不会犯很多协议错误,但是过滤器机制可以有效地解决解析器确实犯的协议错误。这些贡献超出了希伯来语处理的范围,并且对NLP社区具有普遍适用性。希伯来语是一种形态丰富的语言的特例,此作品中提出的想法对于处理其他语言(包括英语)也很有用。在不确定输入的任何情况下,基于格的解析方法都非常有用。使用外部词典扩展树库派生的解析器的词法覆盖范围与具有小树库的任何语言都相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号