首页> 外文期刊>Computational linguistics >Feature Forest Models for Probabilistic HPSG Parsing
【24h】

Feature Forest Models for Probabilistic HPSG Parsing

机译:用于概率HPSG解析的功能林模型

获取原文
获取原文并翻译 | 示例

摘要

Probabilistic modeling of lexicalized grammars is difficult because these grammars exploit complicated data structures, such as typed feature structures. This prevents us from applying common methods of probabilistic modeling in which a complete structure is divided into substructures under the assumption of statistical independence among sub-structures. For example, part-of-speech tagging of a sentence is decomposed into tagging of each word, and CFG parsing is split into applications of CFG rules. These methods have relied on the structure of the target problem, namely lattices or trees, and cannot be applied to graph structures including typed feature structures. This article proposes the feature forest model as a solution to the problem of probabilistic modeling of complex data structures including typed feature structures. The feature forest model provides a method for probabilistic modeling without the independence assumption when probabilistic events are represented with feature forests. Feature forests are generic data structures that represent ambiguous trees in a packed forest structure. Feature forest models are maximum entropy models defined over feature forests. A dynamic programming algorithm is proposed for maximum entropy estimation without unpacking feature forests. Thus probabilistic modeling of any data structures is possible when they are represented by feature forests. This article also describes methods for representing HPSG syntactic structures and predicate-argument structures with feature forests. Hence, we describe a complete strategy for developing probabilistic models for HPSG parsing. The effectiveness of the proposed methods is empirically evaluated through parsing experiments on the Penn Treebank, and the promise of applicability to parsing of real-world sentences is discussed.
机译:词汇化语法的概率建模很困难,因为这些语法会利用复杂的数据结构,例如类型化特征结构。这使我们无法采用概率建模的常用方法,其中在子结构之间的统计独立性的假设下,将完整的结构分为子结构。例如,句子的词性标记被分解为每个单词的标记,而CFG解析被划分为CFG规则的应用。这些方法依赖于目标问题的结构,即网格或树,不能应用于包括类型化特征结构的图结构。本文提出了特征森林模型,以解决包括类型化特征结构在内的复杂数据结构的概率建模问题。当用特征森林表示概率事件时,特征森林模型提供了一种无需独立假设的概率建模方法。特征林是代表打包森林结构中不明确树的通用数据结构。特征林模型是在特征林上定义的最大熵模型。提出了一种动态规划算法,用于最大熵估计,而无需解压缩特征森林。因此,当任何数据结构由特征林表示时,就可以对它们进行概率建模。本文还介绍了用特征森林表示HPSG句法结构和谓词-参数结构的方法。因此,我们描述了开发用于HPSG解析的概率模型的完整策略。通过在Penn Treebank上进行的解析实验,对所提出方法的有效性进行了经验评估,并讨论了适用于解析真实句子的前景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号