首页> 外文OA文献 >Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar
【2h】

Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar

机译:基于树库的汉语词汇功能语法习得

摘要

Scaling wide-coverage, constraint-based grammars such as Lexical-Functional Grammars (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001) or Head-Driven Phrase Structure Grammars (HPSG) (Pollard and Sag, 1994) from fragments to naturally occurring unrestricted text is knowledge-intensive, time-consuming and (often prohibitively) expensive. A number of researchers have recently presented methods to automatically acquire wide-coverage, probabilistic constraint-based grammatical resources from treebanks (Cahill et al., 2002, Cahill et al., 2003; Cahill et al., 2004; Miyao et al., 2003; Miyao et al., 2004; Hockenmaier and Steedman, 2002; Hockenmaier, 2003), addressing the knowledge acquisition bottleneck in constraint-based grammar development. Research to date has concentrated on English and German. In this paper we report on an experiment to induce wide-coverage, probabilistic LFG grammatical and lexical resources for Chinese from the Penn Chinese Treebank (CTB) (Xue et al., 2002) based on an automatic f-structure annotation algorithm. Currently 96.751% of the CTB trees receive a single, covering and connected f-structure, 0.112% do not receive an fstructure due to feature clashes, while 3.137% are associated with multiple f-structure fragments. From the f-structure-annotated CTB we extract a total of 12975 lexical entries with 20 distinct subcategorisation frame types. Of these 3436 are verbal entries with a total of 11 different frame types. We extract a number of PCFG-based LFG approximations. Currently our best automatically induced grammars achieve an f-score of 81.57% against the trees in unseen articles 301-325; 86.06% f-score (all grammatical functions) and 73.98% (preds-only) against the dependencies derived from the f-structures automatically generated for the original trees in 301-325 and 82.79% (all grammatical functions) and 67.74% (preds-only) against the dependencies derived from the manually annotated gold-standard f-structures for 50 trees randomly selected from articles 301-325.
机译:从片段到自然地扩展范围广泛,基于约束的语法,例如词汇功能语法(LFG)(Kaplan和Bresnan,1982; Bresnan,2001)或头部驱动短语结构语法(HPSG)(Pollard和Sag,1994)。出现不受限制的文本是知识密集,费时且(通常是令人望而却步的)昂贵的。最近,许多研究人员提出了从树库中自动获取基于广泛性,概率约束的语法资源的方法(Cahill等,2002; Cahill等,2003; Cahill等,2004; Miyao等, 2003年; Miyao等人,2004年; Hockenmaier和Steedman,2002年; Hockenmaier,2003年),解决了基于约束的语法开发中的知识获取瓶颈。迄今为止,研究集中在英语和德语。在本文中,我们报道了一项基于f结构自动标注算法的实验,目的是从Penn Chinese Treebank(CTB)引出广泛的概率性LFG汉语语法资源和词汇资源(Xue等,2002)。当前,有96.751%的CTB树接收到一个单一的,覆盖且连接的f结构,0.112%的树由于特征冲突而没有收到f结构,而3.137%的树与多个f结构片段相关。从带f结构注释的CTB中,我们提取了总共12975个词法条目,其中包含20种不同的子类别框架类型。在这3436个语音条目中,共有11种不同的帧类型。我们提取了许多基于PCFG的LFG近似值。目前,我们最好的自动诱导语法相对于未见文章301-325的树的f分数达到81.57%; f分数的86.06%(所有语法功能)和73.98%(仅preds)相对于从301-325自动为原始树自动生成的f结构派生的依赖性以及82.79%(所有语法功能)和67.74%(preds) (仅))与从手动注释的黄金标准f结构得出的相关性相关,该结构针对从商品301-325中随机选择的50棵树。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号